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PREFACE 


In  1957,  in  his  Princeton  doctoral  dissertation,  Hugh  Everett,  III,  pro¬ 
posed  a  new  interpretation  of  quantum  mechanics  that  denies  the  exist¬ 
ence  of  a  separate  classical  realm  and  asserts  that  it  makes  sense  to  talk 
about  a  state  vector  for  the  whole  universe.  This  state  vector  never  col¬ 
lapses,  and  hence  reality  as  a  whole  is  rigorously  deterministic.  This 
reality,  which  is  described  jointly  by  the  dynamical  variables  and  the 
state  vector,  is  not  the  reality  we  customarily  think  of,  but  is  a  reality 
composed  of  many  worlds.  By  virtue  of  the  temporal  development  of  the 
dynamical  variables  the  state  vector  decomposes  naturally  into  orthogonal 
vectors,  reflecting  a  continual  splitting  of  the  universe  into  a  multitude  of 
mutually  unobservable  but  equally  teal  worlds,  in  each  of  which  every 
good  measurement  has  yielded  a  definite  result  and  in  most  of  which  the 
familiar  statistical  quantum  laws  hold. 

In  addition  to  his  short  thesis  Everett  wrote  a  much  larger  exposition 
of  his  ideas,  which  was  never  published.  The  present  volume  contains 
both  of  these  works,  together  with  a  handful  of  papers  by  others  on  the 
same  theme.  Looked  at  in  one  way,  Everett’s  interpretation  calls  for  a 
return  to  naive  realism  and  the  old  fashioned  idea  that  there  can  be  a 
direct  correspondence  between  formalism  and  reality.  Because  physicists 
have  become  more  sophisticated  than  this,  and  above  all  because  the  im¬ 
plications  of  his  approach  appear  to  them  so  bizarre,  few  have  taken 
Everett  seriously.  Nevertheless  his  basic  premise  provides  such  a  stimu¬ 
lating  framework  for  discussions  of  the  quantum  theory  of  measurement 
that  this  volume  should  be  on  every  quantum  theoretician’s  shelf. 


v 


a  picture,  incomplete  yet  not  false,  of  the  universe  as  Ts’ui  Pen  con¬ 
ceived  it  to  be.  Differing  from  Newton  and  Schopenhauer,...  [he]  did  not 
think  of  time  as  absolute  and  uniform.  He  believed  in  an  infinite  series 
of  times,  in  a  dizzily  growing,  ever  spreading  network  of  diverging,  con¬ 
verging  and  parallel  times.  This  web  of  time  —  the  strands  of  which 
approach  one  another,  bifurcate,  intersect  or  ignore  each  other  through 
the  centuries  —  embraces  every  possibility.  We  do  not  exist  in  most  of 
them.  In  some  you  exist  and  not  I,  while  in  others  I  do,  and  you  do  not, 
and  in  yet  others  both  of  us  exist.  In  this  one,  in  which  chance  has 
favored  me,  you  have  come  to  my  gate.  In  another,  you,  crossing  the  gar¬ 
den,  have  found  me  dead.  In  yet  another,  I  say  these  very  same  words, 
but  am  an  error,  a  phantom.” 

Jorge  Luis  Borges,  The  Garden  of  Forking  Paths 


“Actualities  seem  to  float  in  a  wider  sea  of  possibilities  from  out  of 
which  they  were  chosen;  and  somewhere,  indeterminism  says,  such  possi¬ 
bilities  exist,  and  form  part  of  the  truth.” 


William  James 
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THE  THEORY  OF  THE  UNIVERSAL  WAVE  FUNCTION 


Hugh  Everett,  III 

I.  INTRODUCTION 

We  begin,  as  a  way  of  entering  our  subject,  by  characterizing  a  particu¬ 
lar  interpretation  of  quantum  theory  which,  although  not  representative  of 
the  more  careful  formulations  of  some  writers,  is  the  most  common  form 
encountered  in  textbooks  and  university  lectures  on  the  subject. 

A  physical  system  is  described  completely  by  a  state  function  ft, 
which  is  an  element  of  a  Hilbert  space,  and  which  furthermore  gives  in¬ 
formation  only  concerning  the  probabilities  of  the  results  of  various  obser¬ 
vations  which  can  be  made  on  the  system.  The  state  function  ft  is 
thought  of  as  objectively  characterizing  the  physical  system,  i.e.,  at  all 
times  an  isolated  system  is  thought  of  as  possessing  a  state  function,  in¬ 
dependently  of  our  state  of  knowledge  of  it.  On  the  other  hand,  ft  changes 
in  a  causal  manner  so  long  as  the  system  remains  isolated,  obeying  a  dif¬ 
ferential  equation.  Thus  there  are  two  fundamentally  different  ways  in 
which  the  state  function  can  change:1 

Process  1:  The  discontinuous  change  brought  about  by  the  observa¬ 
tion  of  a  quantity  with  eigenstates  in  which  the  state 

ft  will  be  changed  to  the  state  with  probability  |(^,0j)|2. 

Process  2:  The  continuous,  deterministic  change  of  state  of  the 

(isolated)  system  with  time  according  to  a  wave  equation  =  U^, 
where  U  is  a  linear  operator. 


We  use  here  the  terminology  of  von  Neumann  [l7]. 
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The  question  of  the  consistency  of  the  scheme  arises  if  one  contem¬ 
plates  regarding  the  observer  and  his  object-system  as  a  single  (composite) 
physical  system.  Indeed,  the  situation  becomes  quite  paradoxical  if  we 
allow  for  the  existence  of  more  than  one  observer.  Let  us  consider  the 
case  of  one  observer  A,  who  is  performing  measurements  upon  a  system  S, 
the  totality  (A  +  S)  in  turn  forming  the  object-system  for  another  observer, 
B. 

If  we  are  to  deny  the  possibility  of  B’s  use  of  a  quantum  mechanical 
description  (wave  function  obeying  wave  equation)  for  A  +  S,  then  we 
must  be  supplied  with  some  alternative  description  for  systems  which  con¬ 
tain  observers  (or  measuring  apparatus).  Furthermore,  we  would  have  to 
have  a  criterion  for  telling  precisely  what  type  of  systems  would  have  the 
preferred  positions  of  “measuring  apparatus”  or  “observer”  and  be  sub¬ 
ject  to  the  alternate  description.  Such  a  criterion  is  probably  not  capable 
of  rigorous  formulation. 

On  the  other  hand,  if  we  do  allow  B  to  give  a  quantum  description  to 
A  +  S,  by  assigning  a  state  function  then,  so  long  as  B  does  not 

interact  with  A  +  S,  its  state  changes  causally  according  to  Process  2, 
even  though  A  may  be  performing  measurements  upon  S.  From  B’s  point 
of  view,  nothing  resembling  Process  1  can  occur  (there  are  no  discontinui¬ 
ties),  and  the  question  of  the  validity  of  A’s  use  of  Process  1  is  raised. 
That  is,  apparently  either  A  is  incorrect  in  assuming  Process  1,  with  its 
probabilistic  implications,  to  apply  to  his  measurements,  or  else  B’s  state 
function,  with  its  purely  causal  character,  is  an  inadequate  description  of 
what  is  happening  to  A  +  S. 

To  better  illustrate  the  paradoxes  which  can  arise  from  strict  adher¬ 
ence  to  this  interpretation  we  consider  the  following  amusing,  but  extremely 
hypothetical  drama. 

Isolated  somewhere  out  in  space  is  a  room  containing  an  observer, 

A,  who  is  about  to  perform  a  measurement  upon  a  system  S.  After 

performing  his  measurement  he  will  record  the  result  in  his  notebook. 

We  assume  that  he  knows  the  state  function  of  S  (perhaps  as  a  result 
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of  previous  measurement),  and  that  it  is  not  an  eigenstate  of  the  mea¬ 
surement  he  is  about  to  perform.  A,  being  an  orthodox  quantum  theo¬ 
rist,  then  believes  that  the  outcome  of  his  measurement  is  undetermined 
and  that  the  process  is  correctly  described  by  Process  1. 

In  the  meantime,  however,  there  is  another  observer,  B,  outside 
the  room,  who  is  in  possession  of  the  state  function  of  the  entire  room, 
including  S,  the  measuring  apparatus,  and  A,  just  prior  to  the  mea¬ 
surement.  B  is  only  interested  in  what  will  be  found  in  the  notebook 
one  week  hence,  so  he  computes  the  state  function  of  the  room  for  one 
week  in  the  future  according  to  Process  2.  One  week  passes,  and  we 
find  B  still  in  possession  of  the  state  function  of  the  room,  which 
this  equally  orthodox  quantum  theorist  believes  to  be  a  complete  de¬ 
scription  of  the  room  and  its  contents.  If  B’s  state  function  calcula¬ 
tion  tells  beforehand  exactly  what  is  going  to  be  in  the  notebook,  then 
A  is  incorrect  in  his  belief  about  the  indeterminacy  of  the  outcome  of 
his  measurement.  We  therefore  assume  that  B’s  state  function  con¬ 
tains  non-zero  amplitudes  over  several  of  the  notebook  entries. 

At  this  point,  B  opens  the  door  to  the  room  and  looks  at  the  note¬ 
book  (performs  his  observation).  Having  observed  the  notebook  entry, 
he  turns  to  A  and  informs  him  in  a  patronizing  manner  that  since  his 
(B’s)  wave  function  just  prior  to  his  entry  into  the  room,  which  he 
knows  to  have  been  a  complete  description  of  the  room  and  its  contents, 
had  non-zero  amplitude  over  other  than  the  present  result  of  the  mea¬ 
surement,  the  result  must  have  been  decided  only  when  B  entered  the 
room,  so  that  A,  his  notebook  entry,  and  his  memory  about  what 
occurred  one  week  ago  had  no  independent  objective  existence  until 
the  intervention  by  B.  In  short,  B  implies  that  A  owes  his  present 
objective  existence  to  B’s  generous  nature  which  compelled  him  to 
intervene  on  his  behalf.  However,  to  B’s  consternation,  A  does  not 
react  with  anything  like  the  respect  and  gratitude  he  should  exhibit 
towards  B,  and  at  the  end  of  a  somewhat  heated  reply,  in  which  A 
conveys  in  a  colorful  manner  his  opinion  of  B  and  his  beliefs,  he 
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rudely  punctures  B’s  ego  by  observing  that  if  B’s  view  is  correct, 
then  he  has  no  reason  to  feel  complacent,  since  the  whole  present 
situation  may  have  no  objective  existence,  but  may  depend  upon  the 
future  actions  of  yet  another  observer. 

It  is  now  clear  that  the  interpretation  of  quantum  mechanics  vjith  which 
we  began  is  untenable  if  we  are  to  consider  a  universe  containing  more 
than  one  observer.  We  must  therefore  seek  a  suitable  modification  of  this 
scheme,  or  an  entirely  different  system  of  interpretation.  Several  alterna¬ 
tives  which  avoid  the  paradox  are: 

Alternative  1:  To  postulate  the  existence  of  only  one  observer  in  the 
universe.  This  is  the  solipsist  position,  in  which  each  of  us  must 
hold  the  view  that  he  alone  is  the  only  valid  observer,  with  the 
rest  of  the  universe  and  its  inhabitants  obeying  at  all  times  Process 
2  except  when  under  his  observation. 

This  view  is  quite  consistent,  but  one  must  feel  uneasy  when,  for 
example,  writing  textbooks  on  quantum  mechanics,  describing  Process  1, 
for  the  consumption  of  other  persons  to  whom  it  does  not  apply. 

Alternative  2:  To  limit  the  applicability  of  quantum  mechanics  by 
asserting  that  the  quantum  mechanical  description  fails  when 
applied  to  observers,  or  to  measuring  apparatus,  or  more  generally 
to  systems  approaching  macroscopic  size. 

If  we  try  to  limit  the  applicability  so  as  to  exclude  measuring  apparatus, 
or  in  general  systems  of  macroscopic  size,  we  are  faced  with  the  difficulty 
of  sharply  defining  the  region  of  validity.  For  what  n  might  a  group  of  n 
particles  be  construed  as  forming  a  measuring  device  so  that  the  quantum 
description  fails?  And  to  draw  the  line  at  human  or  animal  observers,  i.e., 
to  assume  that  all  mechanical  aparata  obey  the  usual  laws,  but  that  they 
are  somehow  not  valid  for  living  observers,  does  violence  to  the  so-called 
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ry 

principle  of  psycho-physical  parallelism,  and  constitutes  a  view  to  be 
avoided,  if  possible.  To  do  justice  to  this  principle  we  must  insist  that 
we  be  able  to  conceive  of  mechanical  devices  (such  as  servomechanisms), 
obeying  natural  laws,  which  we  would  be  willing  to  call  observers. 

Alternative  3:  To  admit  the  validity  of  the  state  function  description, 
but  to  deny  the  possibility  that  B  could  ever  be  in  possession  of 
the  state  function  of  A  +  S.  Thus  one  might  argue  that  a  determi¬ 
nation  of  the  state  of  A  would  constitute  such  a  drastic  interven¬ 
tion  that  A  would  cease  to  function  as  an  observer. 

The  first  objection  to  this  view  is  that  no  matter  what  the  state  of 
A  +  S  is,  there  is  in  principle  a  complete  set  of  commuting  operators  for 
which  it  is  an  eigenstate,  so  that,  at  least,  the  determination  of  these 
quantities  will  not  affect  the  state  nor  in  any  way  disrupt  the  operation  of 
A.  There  are  no  fundamental  restrictions  in  the  usual  theory  about  the 
knowability  of  any  state  functions,  and  the  introduction  of  any  such  re¬ 
strictions  to  avoid  the  paradox  must  therefore  require  extra  postulates. 

The  second  objection  is  that  it  is  not  particularly  relevant  whether  or 
not  B  actually  knows  the  precise  state  function  of  A  +  S.  If  he  merely 
believes  that  the  system  is  described  by  a  state  function,  which  he  does 
not  presume  to  know,  then  the  difficulty  still  exists.  He  must  then  believe 
that  this  state  function  changed  deterministically,  and  hence  that  there 
was  nothing  probabilistic  in  A’s  determination. 


In  the  words  of  von  Neumann  ([l7],  p.  418):  “...it  is  a  fundamental  requirement 
of  the  scientific  viewpoint  —  the  so-called  principle  of  the  psycho-physical  parallel¬ 
ism  —  that  it  must  be  possible  so  to  describe  the  extra-physical  process  of  the  sub¬ 
jective  perception  as  if  it  were  in  reality  in  the  physical  world  —  f.e.,  to  assign  to 
its  parts  equivalent  physical  processes  in  the  objective  environment,  in  ordinary 
space.” 
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Alternative  4:  To  abandon  the  position  that  the  state  function  is  a 
complete  description  of  a  system.  The  state  function  is  to  be  re¬ 
garded  not  as  a  description  of  a  single  system,  but  of  an  ensemble 
of  systems,  so  that  the  probabilistic  assertions  arise  naturally 
from  the  incompleteness  of  the  description. 

It  is  assumed  that  the  correct  complete  description,  which  would  pre¬ 
sumably  involve  further  (hidden)  parameters  beyond  the  state  function 
alone,  would  lead  to  a  deterministic  theory,  from  which  the  probabilistic 
aspects  arise  as  a  result  of  our  ignorance  of  these  extra  parameters  in  the 
same  manner  as  in  classical  statistical  mechanics. 

Alternative  5:  To  assume  the  universal  validity  of  the  quantum  de¬ 
scription,  by  the  complete  abandonment  of  Process  1.  The  general 
validity  of  pure  wave  mechanics,  without  any  statistical  assertions, 
is  assumed  for  all  physical  systems,  including  observers  and  mea¬ 
suring  apparata.  Observation  processes  are  to  be  described  com¬ 
pletely  by  the  state  function  of  the  composite  system  which  in¬ 
cludes  the  observer  and  his  object-system,  and  which  at  all  times 
obeys  the  wave  equation  (Process  2). 

This  brief  list  of  alternatives  is  not  meant  to  be  exhaustive,  but  has 
been  presented  in  the  spirit  of  a  preliminary  orientation.  We  have,  in  fact, 
omitted  one  of  the  foremost  interpretations  of  quantum  theory,  namely  the 
position  of  Niels  Bohr.  The  discussion  will  be  resumed  in  the  final  chap¬ 
ter,  when  we  shall  be  in  a  position  to  give  a  more  adequate  appraisal  of 
the  various  alternate  interpretations.  For  the  present,  however,  we  shall 
concern  ourselves  only  with  the  development  of  Alternative  5. 

It  is  evident  that  Alternative  5  is  a  theory  of  many  advantages.  It  has 
the  virtue  of  logical  simplicity  and  it  is  complete  in  the  sense  that  it  is 
applicable  to  the  entire  universe.  All  processes  are  considered  equally 
(there  are  no  “measurement  processes”  which  play  any  preferred  role), 
and  the  principle  of  psycho-physical  parallelism  is  fully  maintained.  Since 
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the  universal  validity  of  the  state  function  description  is  asserted,  one 
can  regard  the  state  functions  themselves  as  the  fundamental  entities, 
and  one  can  even  consider  the  state  function  of  the  whole  universe.  In 
this  sense  this  theory  can  be  called  the  theory  of  the  “universal  wave 
function,”  since  all  of  physics  is  presumed  to  follow  from  this  function 
alone.  There  remains,  however,  the  question  whether  or  not  such  a  theory 
can  be  put  into  correspondence  with  our  experience. 

The  present  thesis  is  devoted  to  showing  that  this  concept  of  a  uni¬ 
versal  wave  mechanics,  together  with  the  necessary  correlation  machinery 
for  its  interpretation,  forms  a  logically  self  consistent  description  of  a 
universe  in  which  several  observers  are  at  work. 

Vie  shall  be  able  to  introduce  into  the  theory  systems  which  represent 
observers.  Such  systems  can  be  conceived  as  automatically  functioning 
machines  (servomechanisms)  possessing  recording  devices  (memory)  and 
which  are  capable  of  responding  to  their  environment.  The  behavior  of 
these  observers  shall  always  be  treated  within  the  framework  of  wave 
mechanics.  Furthermore,  we  shall  deduce  the  probabilistic  assertions  of 
Process  1  as  subjective  appearances  to  such  observers,  thus  placing  the 
theory  in  correspondence  with  experience.  We  are  then  led  to  the  novel 
situation  in  which  the  formal  theory  is  objectively  continuous  and  causal, 
while  subjectively  discontinuous  and  probabilistic.  While  this  point  of 
view  thus  shall  ultimately  justify  our  use  of  the  statistical  assertions  of 
the  orthodox  view,  it  enables  us  to  do  so  in  a  logically  consistent  manner, 
allowing  for  the  existence  of  other  observers.  At  the  same  time  it  gives  a 
deeper  insight  into  the  meaning  of  quantized  systems,  and  the  role  played 
by  quantum  mechanical  correlations. 

In  order  to  bring  about  this  correspondence  with  experience  for  the 
pure  wave  mechanical  theory,  we  shall  exploit  the  correlation  between 
subsystems  of  a  composite  system  which  is  described  by  a  state  function. 
A  subsystem  of  such  a  composite  system  does  not,  in  general,  possess  an 
independent  state  function.  That  is,  in  general  a  composite  system  can¬ 
not  be  represented  by  a  single  pair  of  subsystem  states,  but  can  be  repre- 
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sented  only  by  a  superposition  of  such  pairs  of  subsystem  states.  For 
example,  the  Schrodinger  wave  function  for  a  pair  of  particles,  ^(x^Xj), 
cannot  always  be  written  in  the  form  i[r  =  0(x1)7;(x2),  but  only  in  the  form 
ijj  =  ajj01(x1)rjJ(x2).  In  the  latter  case,  there  is  no  single  state  for 

i,j 

Particle  1  alone  or  Particle  2  alone,  but  only  the  superposition  of  such 
cases. 

In  fact,  to  any  arbitrary  choice  of  state  for  one  subsystem  there  will 
correspond  a  relative  state  for  the  other  subsystem,  which  will  generally 
be  dependent  upon  the  choice  of  state  for  the  first  subsystem,  so  that  the 
state  of  one  subsystem  is  not  independent,  but  correlated  to  the  state  of 
the  remaining  subsystem.  Such  correlations  between  systems  arise  from 
interaction  of  the  systems,  and  from  our  point  of  view  all  measurement  and 
observation  processes  are  to  be  regarded  simply  as  interactions  between 
observer  and  object-system  which  produce  strong  correlations. 

Let  one  regard  an  observer  as  a  subsystem  of  the  composite  system: 
observer  +  object-system.  It  is  then  an  inescapable  consequence  that 
after  the  interaction  has  taken  place  there  will  not,  generally,  exist  a 
single  observer  state.  There  will,  however,  be  a  superposition  of  the  com¬ 
posite  system  states,  each  element  of  which  contains  a  definite  observer 
state  and  a  definite  relative  object-system  state.  Furthermore,  as  we  shall 
see,  each  of  these  relative  object-system  states  will  be,  approximately, 
the  eigenstates  of  the  observation  corresponding  to  the  value  obtained  by 
the  observer  which  is  described  by  the  same  element  of  the  superposition. 
Thus,  each  element  of  the  resulting  superposition  describes  an  observer 
who  perceived- a  definite  and  generally  different  result,  and  to  whom  it 
appears  that  the  object-system  state  has  been  transformed  into  the  corre¬ 
sponding  eigenstate.  In  this  sense  the  usual  assertions  of  Process  1 
appear  to  hold  on  a  subjective  level  to  each  observer  described  by  an  ele¬ 
ment  of  the  superposition.  We  shall  also  see  that  correlation  plays  an 
important  role  in  preserving  consistency  when  several  observers  are  present 
and  allowed  to  interact  with  one  another  (to  “consult”  one  another)  as 
well  as  with  other  object-systems. 
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In  order  to  develop  a  language  for  interpreting  our  pure  wave  mechan¬ 
ics  for  composite  systems  we  shall  find  it  useful  to  develop  quantitative 
definitions  for  such  notions  as  the  “sharpness”  or  “definiteness”  of  an 
operator  A  for  a  state  i//,  and  the  “degree  of  correlation”  between  the 
subsystems  of  a. composite  system  or  between  a  pair  of  operators  in  the 
subsystems,  so  that  we  can  use  these  concepts  in  an  unambiguous  manner. 
The  mathematical  development  of  these  notions  will  be  carried  out  in  the 
next  chapter  (II)  using  some  concepts  borrowed  from  Information  Theory. 

We  shall  develop  there  the  general  definitions  of  information  and  correla¬ 
tion,  as  well  as  some  of  their  more  important  properties.  Throughout 
Chapter  II  we  shall  use  the  language  of  probability  theory  to  facilitate  the 
exposition,  and  because  it  enables  us  to  introduce  in  a  unified  manner  a 
number  of  concepts  that  will  be  of  later  use.  We  shall  nevertheless  sub¬ 
sequently  apply  the  mathematical  definitions  directly  to  state  functions, 
by  replacing  probabilities  by  square  amplitudes,  without,  however,  making 
any  reference  to  probability  models. 

Having  set  the  stage,  so  to  speak,  with  Chapter  II,  we  turn  to  quantum 
mechanics  in  Chapter  III.  There  we  first  investigate  the  quantum  forma¬ 
lism  of  composite  systems,  particularly  the  concept  of  relative  state  func¬ 
tions,  and  the  meaning  of  the  representation  of  subsystems  by  non¬ 
interfering  mixtures  of  states  characterized  by  density  matrices.  The 
notions  of  information  and  correlation  are  then  applied  to  quantum  mechan¬ 
ics.  The  final  section  of  this  chapter  discusses  the  measurement  process, 
which  is  regarded  simply  as  a  correlation-inducing  interaction  between 
subsystems  of  a  single  isolated  system.  A  simple  example  of  such  a 
measurement  is  given  and  discussed,  and  some  general  consequences  of 
the  superposition  principle  are  considered. 


The  theory  originated  by  Claude  E.  Shannon  [l9]. 
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This  will  be  followed  by  an  abstract  treatment  of  the  problem  of 
Observation  (Chapter  IV),  In  this  chapter  we  make  use  only  of  the  super¬ 
position  principle,  and  general  rules  by  which  composite  system  states 
are  formed  of  subsystem  states,  in  order  that  our  results  shall  have  the 
greatest  generality  and  be  applicable  to  any  form  of  quantum  theory  for 
which  these  principles  hold.  (Elsewhere,  when  giving  examples,  we  re¬ 
strict  ourselves  to  the  non-relativistic  Schrodinger  Theory  for  simplicity.) 
The  validity  of  Process  1  as  a  subjective  phenomenon  is  deduced,  as  well 
as  the  consistency  of  allowing  several  observers  to  interact  with  one 
another. 

Chapter  V  supplements  the  abstract  treatment  of  Chapter  IV  by  discus¬ 
sing  a  number  of  diverse  topics  from  the  point  of  view  of  the  theory  of 
pure  wave  mechanics,  including  the  existence  and  meaning  of  macroscopic 
objects  in  the  light  of  their  atomic  constitution,  amplification  processes 
in  measurement,  questions  of  reversibility  and  irreversibility,  and  approxi¬ 
mate  measurement. 

The  final  chapter  summarizes  the  situation,  and  continues  the  discus¬ 
sion  of  alternate  interpretations  of  quantum  mechanics. 


II.  PROBABILITY,  INFORMATION,  AND  CORRELATION 


The  present  chapter  is  devoted  to  the  mathematical  development  of  the 
concepts  of  information  and  correlation.  As  mentioned  in  the  introduction 
we  shall  use  the  language  of  probability  theory  throughout  this  chapter  to 
facilitate  the  exposition,  although  we  shall  apply  the  mathematical  defini¬ 
tions  and  formulas  in  later  chapters  without  reference  to  probability  models. 
We  shall  develop  our  definitions  and  theorems  in  full  generality,  for  proba¬ 
bility  distributions  over  arbitrary  sets,  rather  than  merely  for  distributions 
over  real  numbers,  with  which  we  are  mainly  interested  at  present.  We 
take  this  course  because  it  is  as  easy  as  the  restricted  development,  and 
because  it  gives  a  better  insight  into  the  subject. 

The  first  three  sections  develop  definitions  and  properties  of  informa¬ 
tion  and  correlation  for  probability  distributions  over  finite  sets  only.  In 
section  four  the  definition  of  correlation  is  extended  to  distributions  over 
arbitrary  sets,  and  the  general  invariance  of  the  correlation  is  proved. 
Section  five  then  generalizes  the  definition  of  information  to  distributions 
over  arbitrary  sets.  Finally,  as  illustrative  examples,  sections  seven  and 
eight  give  brief  applications  to  stochastic  processes  and  classical  mechan¬ 
ics,  respectively. 

§1.  Finite  joint  distributions 

We  assume  that  we  have  a  collection  of  finite  sets,  !t,2f,...,2,  whose 
elements  are  denoted  by  Xj  e  %,  yj  e  V,...,  z^  t  %,  etc.,  and  that  we  have 
a  joint  probability  distribution,  P  =  PCx^yj.-.-.z^),  defined  on  the  carte¬ 
sian  product  of  the  sets,  which  represents  the  probability  of  the  combined 
event  Xj,yj,...,  and  z^.  We  then  denote  by  X,Y,...,Z  the  random  varia¬ 
bles  whose  values  are  the  elements  of  the  sets  with  probabili¬ 

ties  given  by  P. 
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For  any  subset  Y,...,Z,  of  a  set  of  random  variables  W,...,X,  Y,...,Z, 
with  joint  probability  distribution  P(wj,...,Xj,yj(,...,2£),  the  marginal  dis¬ 
tribution,  P(yk,...,Z£),  is  defined  to  be: 


(1.1)  P(yk,...,Z£>  =  ^  P(wi,...,Xj,yk,...,Z£)  , 



which  represents  the  probability  of  the  joint  occurrence  of  yk,...,Z£,  with 
no  restrictions  upon  the  remaining  variables. 

For  any  subset  Y,...,Z  of  a  set  of  random  variables  the  conditional 
distribution,  conditioned  upon  the  values  W  =  wit...,X  =  x-  for  any  re- 

Wj,..:,Xj  J 

maining  subset  W,...,X,  and  denoted  by  P  J(y k,...,Z£),  is  defined 

to  be:1 

P(wi,.,.,Xj,yk,..,,Z£) 


(1.2) 


W;,...,Xj 

p  J(yk>-..z£> 


P(wi,...,Xj) 


which  represents  the  probability  of  the  joint  event  Y  =  yk,...,Z  =  Z£,  con¬ 
ditioned  by  the  fact  that  W,...,X  are  known  to  have  taken  the  values 
Wj,,..,Xj,  respectively. 

For  any  numerical  valued  function  F(yk,...,Z£),  defined  on  the  ele¬ 
ments  of  the  cartesian  product  of  the  expectation,  denoted  by 

Exp  [F],  is  defined  to  be: 


(1.3) 


Exp  [F]  =  ^  P(yk,...,Z£)  F(yk,...,Z£)  . 

k . i 


We  note  that  if  P(yk,...,Z£)  is  a  marginal  distribution  of  some  larger  dis¬ 
tribution  P(wi,...,Xj,yk,...,Z£)  then 

(1.4)  Exp  [F]  =  ^  (  T  P(wi,...,Xj,yk,...,Z£)  J  F(yk,...,Z£) 

k . I V . 1  / 

=  ^  P(wi,...,Xj,yk,...,Z£)F(yk,...,Z£)  , 


We  regard  it  as  undefined  if  P(wi(...,Xj)  =  0.  In  this  case  P(Wj, 
y k> •  •  ■  >z£)  is  necessarily  zero  also. 


,x. 
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so  that  if  we  wish  to  compute  Exp  [F]  with  respect  to  some  joint  distri¬ 
bution  it  suffices  to  use  any  marginal  distribution  of  the  original  distribu¬ 
tion  which  contains  at  least  those  variables  which  occur  in  F. 

We  shall  also  occasionally  be  interested  in  conditional  expectations, 
which  we  define  as: 


(1.5) 

„  Wi-‘ 
Exp  1 

"’XMF]  =  ^  PWj’  '  i(yk,...,Zg)F(yk,...,Z£)  , 

k,...,£ 

and  we 

note  the  following  easily  verified  rules  for  expectations: 

(1.6) 

Exp  [Exp  [F]]  =  Exp  [F]  , 

(1.7) 

ExpUi’“ 

•'Vi[ExpUi’"”Vi’Wk . X*[F]]  =  ExpUi’"',Vj  [F]  , 

(1.8) 

Exp  [F+G]  =  Exp  [F]  +  Exp  [G]  . 

We  should  like  finally  to  comment  upon  the  notion  of  independence. 

Two  random  variables  X  and  Y  with  joint  distribution  P(xj,  yj)  will  be 
said  to  be  independent  if  and  only  if  P(xj,yj)  is  equal  to  P(xj)P(yj) 
for  all  i,  j.  Similarly,  the  groups  of  random  variables  (U...V),  (W...X),..., 
(Y...Z)  will  be  called  mutually  independent  groups  if  and  only  if 
P(ui,...,Vj,  wk,...,X£,...,ym,...,zn)  is  always  equal  to  P(uj,...,Vj) 
P(wk,...,X£)...P(yml...,zn). 

Independence  means  that  the  random  variables  take  on  values  which 
are  not  influenced  by  the  values  of  other  variables  with  respect  to  which 
they  are  independent.  That  is,  the  conditional  distribution  of  one  of  two 
independent  variables,  Y,  conditioned  upon  the  value  x^  for  the  other, 
is  independent  of  Xj,  so  that  knowledge  about  one  variable  tells  nothing 
of  the  other. 

§2.  Information  for  finite  distributions 

Suppose  that  we  have  a  single  random  variable  X,  with  distribution 
P(x-).  We  then  define2  a  number,  1^,  called  the  information  of  X,  to  be: 

2 

This  definition  corresponds  to  the  negative  of  the  entropy  of  a  probability 
distribution  as  defined  by  Shannon  [l9]. 
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(2.1.)  JX  =  2  P(xi>lnP<xi>  =  ExP  [ln  PM  > 

i 

which  is  a  function  of  the  probabilities  alone  and  not  of  any  possible 
numerical  values  of  the  Xj’s  themselves. 

The  information  is  essentially  a  measure  of  the  sharpness  of  a  proba¬ 
bility  distribution,  that  is,  an  inverse  measure  of  its  “spread.”  In  this 
respect  information  plays  a  role  similar  to  that  of  variance.  However,  it 
has  a  number  of  properties  which  make  it  a  superior  measure  of  the 
“sharpness”  than  the  variance,  not  the  least  of  which  is  the  fact  that  it 
can  be  defined  for  distributions  over  arbitrary  sets,  while  variance  is  de¬ 
fined  only  for  distributions  over  real  numbers. 

Any  change  in  the  distribution  P(xj)  which  “levels  out”  the  proba¬ 
bilities  decreases  the  information.  It  has  the  value  zero  for  “perfectly 
sharp”  distributions,  in  which  the  probability  is  one  for  one  of  the  x-  and 
zero  for  all  others,  and  ranges  downward  to  -Inn  for  distributions  over 
n  elements  which  are  equal  over  all  of  the  x^.  The  fact  that  the  informa¬ 
tion  is  nonpositive  is  no  liability,  since  we  are  seldom  interested  in  the 
absolute  information  of  a  distribution,  but  only  in  differences. 

We  can  generalize  (2.1)  to  obtain  the  formula  for  the  information  of  a 
group  of  random  variables  X,Y,...,Z,  with  joint  distribution  P(xi,yj,...,Zj{), 
which  we  denote  by  I^y  Z: 

(2.2)  *XY...Z  =  S  p(xi.yj.-«zk)ln  p(Xi,yj,...,zk) 

i,j,...,k 

=  Exp  [In  P(xi,yj,...,zk)]  , 


^  A  good  discussion  of  information  is  to  be  found  in  Shannon  [l9],  or  Woodward 
[2l].  Note,  however,  that  in  the  theory  of  communication  one  defines  the  informa¬ 
tion  of  a  state  x . ,  which  has  a  priori  probability  Pj,  to  be  —In  P..  We  prefer, 
however,  to  regard  information  as  a  property  of  the  distribution  itself. 
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which  follows  immediately  from  our  previous  definition,  since  the  group  of 

random  variables  X,  Y,...,Z  may  be  regarded  as  a  single  random  variable 

W  which  takes  its  values  in  the  cartesian  product  x2>. 

v  ,...,wn 

Finally,  we  define  a  conditional  information,  1^  2  ’  *°  '3e: 


V  W  .  y 

(2.3)  IXy  2  n=  X  p  m’  ’  n(xi>yj*->zk)lnP  m  n(xifyj,...,zk) 


~vm . wn,  ..  _  n,_  Dvm . wn. 


=  Exp 


v„,...,w„  r  v  ....,w„ 
m’  n  fin  P  m  n‘ 


(xi* yj,-",zk)]  ’ 


a  quantity  which  measures  our  information  about  X,  Y,...,Z  given  that  we 
know  that  V...W  have  taken  the  particular  values  vm/”->wn- 

For  independent  random  variables  X,  Y,...,Z,  the  following  relation¬ 
ship  is  easily  proved: 


(2.4)  Ixy  2  =  ^X  ^Y  ^ ^  ^2  (X,Y,...,Z  independent)  , 


so  that  the  information  of  XY...Z  is  the  sum  of  the  individual  quantities 
of  information,  which  is  in  accord  with  our  intuitive  feeling  that  if  we  are 
given  information  about  unrelated  events,  our  total  knowledge  is  the  sum 
of  the  separate  amounts  of  information.  We  shall  generalize  this  definition 
later,  in  §5. 

§3.  Correlation  for  finite  distributions 

Suppose  that  we  have  a  pair  of  random  variables,  X  and  Y,  with 
joint  distribution  P(xj,yj).  If  we  say  that  X  and  Y  ate  correlated, 
what  we  intuitively  mean  is  that  one  learns  something  about  one  variable 
when  he  is  told  the  value  of  the  other.  Let  us  focus  out  attention  upon 
the  variable  X.  If  we  are  not  informed  of  the  value  of  Y,  then  our  infor¬ 
mation  concerning  X,  Ix,  is  calculated  from  the  marginal  distribution 
P(xj).  However,  if  we  are  now  told  that  Y  has  the  value  y j,  then  our 

information  about  X  changes  to  the  information  of  the  conditional  distri- 

y.  y. 

bution  P  ^(x-),  -  According  to  what  we  have  said,  we  wish  the  degree 

*  X 

correlation  to  measure  how  much  we  learn  about  X  by  being  informed  of 
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y. 

Y’s  value.  However,  since  the  change  of  information,  1^  —  Ix,  may  de¬ 
pend  upon  the  particular  value,  yj,  of  Y  which  we  are  told,  the  natural 
thing  to  do  to  arrive  at  a  single  number  to  measure  the  strength  of  correla¬ 
tion  is  to  consider  the  expected  change  in  information  about  X,  given 
that  we  are  to  be  told  the  value  of  Y.  This  quantity  we  call  the  correla¬ 
tion  information,  or  for  brevity,  the  correlation,  of  X  and  Y,  and  denote 
it  by  |X,Y|.  Thus: 

(3.1)  IX,  Y}  =  Exp  [l£  -  Ix]  =  Exp  [l£]  -  Ix  • 

Expanding  the  quantity  Exp  |^x  J  using  (2.3)  and  the  rules  for  expecta¬ 
tions  (1.6)  — (1.8)  we  find: 

Exp  J  =  ExP  [Exp^  [1°  Pyi(xj)]] 

(3.2)  =  Exp  jjn  — =  Exp  [In  P(xi,  y^)J  -  Exp  [In  P(yj)] 

=  JXY  ~ ]Y  ' 

and  combining  with  (3.1)  we  have: 

(3.3)  [X,  Y|  =  Ij£Y  —  ix  —  iy  ’ 

Thus  the  correlation  is  symmetric  between  X  and  Y,  and  hence  also 
equal  to  the  expected  change  of  information  about  Y  given  that  we  will 
be  told  the  value  of  X.  Furthermore,  according  to  (3.3)  the  correlation 
corresponds  precisely  to  the  amount  of  “missing  information”  if  we 
possess  only  the  marginal  distributions,  i.e.,  the  loss  of  information  if  we 
choose  to  regard  the  variables  as  independent. 

THEOREM  1.  [X,Y!  =  0  if  and  only  if  X  and  Y  are  independent,  and 
is  otherwise  strictly  positive.  (Proof  in  Appendix  I.) 
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In  this  respect  the  correlation  so  defined  is  superior  to  the  usual  cor¬ 
relation  coefficients  of  statistics,  such  as  covariance,  etc.,  which  can  be 
zero  even  when  the  variables  are  not  independent,  and  which  can  assume 
both  positive  and  negative  values.  An  inverse  correlation  is,  after  all, 
quite  as  useful  as  a  direct  correlation.  Furthermore,  it  has  the  great  ad¬ 
vantage  of  depending  upon  the  probabilities  alone,  and  not  upon  any 
numerical  values  of  x-  and  yj,  so  that  it  is  defined  for  distributions 
over  sets  whose  elements  are  of  an  arbitrary  nature,  and  not  only  for  dis¬ 
tributions  over  numerical  properties.  For  example,  we  might  have  a  joint 
probability  distribution  for  the  political  party  and  religious  affiliation  of 
individuals.  Correlation  and  information  are  defined  for  such  distributions, 
although  they  possess  nothing  like  covariance  or  variance. 

We  can  generalize  (3.3)  to  define  a  group  correlation  for  the  groups  of 
random  variables  (U...V),  (W...X),...,  (Y...Z),  denoted  by  (U...V,  W...X, 
...,  Y...Z|  (where  the  groups  are  separated  by  commas),  to  be: 


(3.4)  {U...V,  W...X,...,  Y...ZI  =  Iu_vW...X...Y...Z 

_IU...V_IW...X~---_IY...Z 


again  measuring  the  information  deficiency  for  the  group  marginals.  Theo¬ 
rem  1  is  also  satisfied  by  the  group  correlation,  so  that  it  is  zero  if  and 
only  if  the  groups  are  mutually  independent.  We  can,  of  course,  also  de¬ 
fine  conditional  correlations  in  the  obvious  manner,  denoting  these  quanti¬ 
ties  by  appending  the  conditional  values  as  superscripts,  as  before. 

We  conclude  this  section  by  listing  some  useful  formulas  and  inequali¬ 
ties  which  are  easily  proved: 

(3.5)  {U,V,...,W}  =  Exp 


P(Uj,Vj,...,Wk) 
ln  P(ui)P(vj)...P(wk) 


(3.6)  iU,V,...,W} 


x» -yj 


j  = 


ExPx*-yi 


ln 


>  J(uk,Vl,...,wm) 


pXi"  'y  j(uk)  pXi' '  ,yj(Vl).  ..pXi' '  ■ y  j(wm)J 

(conditional  correlation)  , 
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|...,U,V,...l  =  j...,uv,...!  +  |u,vi  , 

(3.7) 

|...,U,V,...,W,...l  =  !...,UV...W,...|  +  !u,v,...,w!  (comma  removal) 

(3.8)  |...,U,VW,...i  -  |...,UV,W,...|  =  |U,V|  -  |V,W|  (commutator)  , 

(3.9)  |X!  =  0  (definition  of  bracket  with  no  commas)  , 

(3.10)  I....XXV,...!  =  |...,XV,...! 

(removal  of  repeated  variable  within  a  group)  , 

(3.11)  |...,UV,VW,...|  =  |...,UV,W,...j  -  |V,Wi  -  Iy 
(removal  of  repeated  variable  in  separate  groups)  , 


(3.12) 

(3.13) 


|x,x!  =  —  Ijj.  (self  correlation)  , 
...w*...  ...w*... 

lu,vw,xl  J  =  !u,v,x!  J  , 

|U,W,X!  1  =  |U,X!  J 

(removal  of  conditioned  variables)  , 


(3.14)  |XY,Zi  >  IX, Zl  , 

(3.15)  IXY.Zl  ^  |X,Zi  +  |Y,Z!  -  |X,Yl  , 

(3.16)  |X,Y,Zl  >  IX,Yi  +  |X,Z!  . 


Note  that  in  the  above  formulas  any  random  variable  W  may  be  re¬ 
placed  by  any  group  XY...Z  and  the  relation  holds  true,  since  the  set 
XY...Z  may  be  regarded  as  the  single  random  variable  W,  which  takes 
its  values  in  the  cartesian  product 


§4.  Generalization  and  further  properties  of  correlation 

Until  now  we  have  been  concerned  only  with  finite  probability  distri¬ 
butions,  for  which  we  have  defined  information  and  correlation.  We  shall 
now  generalize  the  definition  of  correlation  so  as  to  be  applicable  to  joint 
probability  distributions  over  arbitrary  sets  of  unrestricted  cardinality. 
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We  first  consider  the  effects  of  refinement  of  a  finite  distribution.  For 
example,  we  may  discover  that  the  event  Xj  is  actually  the  disjunction 
of  several  exclusive  events  x  •  x ",  so  that  Xj  occurs  if  any  one  of 
the  xf1  occurs,  i.e.,  the  single  event  Xj  results  from  failing  to  distin¬ 
guish  between  the  xf1.  The  probability  distribution  which  distinguishes 
between  the  x f*  will  be  called  a  refinement  of  the  distribution  which  does 
not.  In  general,  we  shall  say  that  a  distribution  P'=  P'ff-1,. ..$■')  is  a 
refinement  of  P  =  P(xj,...,yj)  if 

(4.1)  P(xif...,yj)  =  P'( xf,...,y^)  (all  i,...,j)  . 

H-..v 

We  now  state  an  important  theorem  concerning  the  behavior  of  correla¬ 
tion  under  a  refinement  of  a  joint  probability  distributions: 

THEOREM  2.  P'  is  a  refinement  of  P  =$>  tX,...,Yr  >  iX,...,Y!  so  that 
correlations  never  decrease  upon  refinement  of  a  distribution.  (Proof  in 
Appendix  I,  §3.) 

As  an  example,  suppose  that  we  have  a  continuous  probability  density 
P(x,y).  By  division  of  the  axes  into  a  finite  number  of  intervals,  Xj,  y j, 
we  arrive  at  a  finite  joint  distribution  P-,  by  integration  of  P(x,y)  over 
the  rectangle  whose  sides  are  the  intervals  Xj  and  yj,  and  which  repre¬ 
sents  the  probability  that  X  t  Xj  and  Y  f  yj.  If  we  now  subdivide  the 
intervals,  the  new  distribution  P'  will  be  a  refinement  of  P,  and  by 
Theorem  2  the  correlation  iX,Yj  computed  from  P'  will  never  be  less 
than  that  computed  from  P.  Theorem  2  is  seen  to  be  simply  the  mathemati¬ 
cal  verification  of  the  intuitive  notion  that  closer  analysis  of  a  situation 
in  which  quantities  X  and  Y  are  dependent  can  never  lessen  the  knowl¬ 
edge  about  Y  which  can  be  obtained  from  X. 

This  theorem  allows  us  to  give  a  general  definition  of  correlation 
which  will  apply  to  joint  distributions  over  completely  arbitrary  sets,  i.e., 
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for  any  probability  measure4  on  an  arbitrary  product  space,  in  the  follow¬ 
ing  manner: 

Assume  that  we  have  a  collection  of  arbitrary  sets  . 2>,  and  a 

probability  measure,  Mp(X  x ... x Z),  on  their  cartesian  product.  Let 
9^  be  any  finite  partition  of  X  into  subsets  X-1,  ^  into  subsets 
yf1,...,  and  Z  into  subsets  Z£,  such  that  the  sets  X^xy^x ...xZ£ 
of  the  cartesian  product  are  measurable  in  the  probability  measure  Mp. 
Another  partition  9V  is  a  refinement  of  9^,  9V  S  9^,  if  JP^  results 
from  9^  by  further  subdivision  of  the  subsets  X?,  y?,...,  Zy.  Each  par¬ 
tition  9^  results  in  a  finite  probability  distribution,  for  which  the  corre- 

Ofi 

lation,  {X,Y,...,Zr  ,  is  always  defined  through  (3.3).  Furthermore  a 
refinement  of  a  partition  leads  to  a  refinement  of  the  probability  distribu¬ 
tion,  so  that  by  Theorem  2: 

(4.8)  9V  S  9*  =>  (x,y,...,z!5>i/  £  |x,y,...,z|5>m 

Now  the  set  of  all  partitions  is  partially  ordered  under  the  refinement 
relation.  Moreover,  because  for  any  pair  of  partitions  9,  9  there  is 
always  a  third  partition  9  which  is  a  refinement  of  both  (common  lower 
bound),  the  set  of  all  partitions  forms  a  directed  set.5  For  a  function,  f, 
on  a  directed  set,  S,  one  defines  a  directed  set  limit,  lim  f,: 

DEFINITION,  lim  f  exists  and  is  equal  to  a  <=£>  for  every  £  >  0  there 
exists  an  a  t  S  such  that  |f (fi)-a\  <  £  for  every  /S  e  S  for  which  /3  %  a. 

It  is  easily  seen  from  the  directed  set  property  of  common  lower  bounds 
that  if  this  limit  exists  it  is  necessarily  unique. 


4  A  measure  is  a  non-negative,  countably  additive  set  function,  defined  on  some 
subsets  of  a  given  set.  It  is  a  probability  measure  if  the  measure  of  the  entire  set 
is  unity.  See  Halmos  [l2]. 


5 


See  Kelley  [is],  p.  65. 
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ca 

By  (4.8)  the  correlation  |X,Y,...,Zr  is  a  monotone  function  on  the 
directed  set  of  all  partitions.  Consequently  the  directed  set  limit,  which 
we  shall  take  as  the  basic  definition  of  the  correlation  |X,Y,...,Z!, 
always  exists.  (It  may  be  infinite,  but  it  is  in  every  case  well  defined.) 

Thus: 

DEFINITION.  (X,Y,...,Zl  =  lim  iX,Y,...,Zl^  , 

and  we  have  succeeded  in  our  endeavor  to  give  a  completely  general  defi¬ 
nition  of  correlation,  applicable  to  all  types  of  distributions. 

It  is  an  immediate  consequence  of  (4.8)  that  this  directed  set  limit  is 

<p 

the  supremum  of  lX,Y,...,Zr  ,  so  that: 

(4.9)  IX, Y . Z!  =  sup  |X,Y,...,Z|3>  , 

9 

which  we  could  equally  well  have  taken  as  the  definition. 

Due  to  the  fact  that  the  correlation  is  defined  as  a  limit  for  discrete 
distributions.  Theorem  1  and  all  of  the  relations  (3.7)  to  (3.15),  which 
contain  only  correlation  brackets,  remain  true  for  arbitrary  distributions. 

Only  (3.11)  and  (3.12),  which  contain  information  terms,  cannot  be  extended. 

We  can  now  prove  an  important  theorem  about  correlation  which  con¬ 
cerns  its  invariant  nature.  Let  X.'y . 2  be  arbitrary  sets  with  proba¬ 

bility  measure  Mp  on  their  cartesian  product.  Let  f  be  any  one-one 
mapping  of  SC  onto  a  set  'll,  g  a  one-one  map  of  K  onto  0,...,  and  h 
a  map  of  Z  onto  IS.  Then  a  joint  probability  distribution  over 
3[x^jx...x2  leads  also  to  one  over  <Ux0x...xffi  where  the  probability 
Mp  induced  on  the  product  tUxDx-"xffi  is  simply  the  measure  which 
assigns  to  each  subset  of  1lxDx**»x®  the  measure  which  is  the  measure 
of  its  image  set  in  for  the  original  measure  Mp.  (We  have 

simply  transformed  to  a  new  set  of  random  variables:  U  =  f(X),  V  =  g(Y), 

....  W  =  h(Z).)  Consider  any  partition  9  of  !X, ?),... ,2  into  the  subsets 
lIXjl, f^ji,...,  12^1  with  probability  distribution  Pij..,k  =  ®®P^iX‘^jX,‘"’X^’k^ 
Then  there  is  a  corresponding  partition  9  of  1l, into  the  image 
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sets  of  the  sets  of  5>,{cUji,l(3i!,...,{t8.  i,  where  'll.  =  f(3C;),  0-  = 

®k  =  b(2>k).  ®ut  Pr°bability  distribution  for  9'  is  the  same  as  that 
for  9,  since  P'y  k  =  M'p^xO.x  ...x(0k)  =  Mp^  x  ^  * -x2k)  = 

Pij...k’  so  that: 

(4.10)  |X>Y,...,Zi5>  =  iU,V,...,W!5> 

Due  to  the  correspondence  between  the  fP’s  and  9 ’s  we  have  that: 

(4.11)  sup  (X,Y,...,Z|9>  =  sup  |U,V,...,Wi5>  , 

9  9' 

and  by  virtue  of  (4.9)  we  have  proved  the  following  theorem: 

Theorem  3.  (X,Y„..,Z!  =  lU,V . W|,  where  11,0,...,®  are  any  one- 

one  images  of  respectively.  In  other  notation:  |X,Y,...,Z)  = 

if(X),  g(Y),...,  h(Z)|  for  all  one-one  functions  f,  g,...,  h. 

This  means  that  changing  variables  to  functionally  related  variables 
preserves  the  correlation.  Again  this  is  plausible  on  intuitive  grounds, 
since  a  knowledge  of  f(x)  is  just  as  good  as  knowledge  of  x,  provided 
that  f  is  one-one. 

A  special  consequence  of  Theorem  3  is  that  for  any  continuous  proba¬ 
bility  density  P(x,  y)  over  real  numbers  the  correlation  between  f(x) 
and  g(y)  is  the  same  as  between  x  and  y,  where  f  and  g  are  any 
real  valued  one-one  functions.  As  an  example  consider  a  probability  dis¬ 
tribution  for  the  position  of  two  particles,  so  that  the  random  variables 
are  the  position  coordinates.  Theorem  3  then  assures  us  that  the  position 
correlation  is  independent  of  the  coordinate  system,  even  if  different 
coordinate  systems  are  used  for  each  particle!  Also  for  a  joint  distribu¬ 
tion  for  a  pair  of  events  in  space-time  the  correlation  is  invariant  to  arbi¬ 
trary  space-time  coordinate  transformations,  again  even  allowing  different 
transformations  for  the  coordinates  of  each  event. 
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These  examples  illustrate  clearly  the  intrinsic  nature  of  the  correla¬ 
tion  of  various  groups  for  joint  probability  distributions,  which  is  implied 
by  its  invariance  against  arbitrary  (one-one)  transformations  of  the  random 
variables.  These  correlation  quantities  are  thus  fundamental  properties 
of  probability  distributions.  A  correlation  is  an  absolute  rather  than  rela¬ 
tive  quantity,  in  the  sense  that  the  correlation  between  (numerical  valued) 
random  variables  is  completely  independent  of  the  scale  of  measurement 
chosen  for  the  variables. 


§5.  Information  for  general  distributions 

Although  we  now  have  a  definition  of  correlation  applicable  to  all 
probability  distributions,  we  have  not  yet  extended  the  definition  of  infor¬ 
mation  past  finite  distributions.  In  order  to  make  this  extension  we  first 
generalize  the  definition  that  we  gave  for  discrete  distributions  to  a  defi¬ 
nition  of  relative  information  for  a  random  variable,  relative  to  a  given 
underlying  measure,  called  the  information  measure,  on  the  values  of  the 
random  variable. 

If  we  assign  a  measure  to  the  set  of  values  of  a  random  variable,  X, 
which  is  simply  the  assignment  of  a  positive  number  a-  to  each  value  Xj 
in  the  finite  case,  we  define  the  information  of  a  probability  distribution 
P(x-)  relative  to  this  information  measure  to  be: 

x-a  POO 

(5.1)  !X  =  S  P(xi)ln  ~JL  =  Exp 

i  1 

If  we  have  a  joint  distribution  of  random  variables  X,Y,...,Z,  with 
information  measures  ja^i,  ibj  i,...,  IcjJ  on  their  values,  then  we  define 
the  total  information  relative  to  these  measures  to  be: 


(5.2) 


p(xi-yj . zk> 


!XY...Z  ^  P(xi>yj . zk)ln  — a.bT” 


ij...k 


i  J 


P(X:,y: . Zk) 

Exp  In - - —  , 

L  aibj-ck  J 
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so  that  the  information  measure  on  the  cartesian  product  set  is  always 
taken  to  be  the  product  measure  of  the  individual  information  measures. 

We  shall  now  alter  our  previous  position  slightly  and  consider  informa¬ 
tion  as  always  being  defined  relative  to  some  information  measure,  so 
that  our  previous  definition  of  information  is  to  be  regarded  as  the  informa¬ 
tion  relative  to  the  measure  for  which  all  the  a^’s,  bj’s,...  and  c^’s  are 
taken  to  be  unity,  which  we  shall  henceforth  call  the  uniform  measure. 

Let  us  now  compute  the  correlation  fX,Y,...,Zi'  by  (3.4)  using  the 
relative  information: 


(5.3)  |x,Y,...,zr=  rxy  z-rx_i'Y-..._I'z 

P(x 


=  Exp 


In 


i,yj,...,zk)~|  [  P(Xi) 

ibj-ck  J  L  ai  J 


Exp  In 


1] 


P(sk) 


f  P(X:,y:,...,Zk)  1 

Exp  [ln  PO^P^-P^)]  =  ix’Y . z!  ’ 


so  that  the  correlation  for  discrete  distributions,  as  defined  by  (3.4),  is 
independent  of  the  choice  of  information  measure,  and  the  correlation  re¬ 
mains  an  absolute,  not  relative  quantity.  It  can,  however,  be  computed 
from  the  information  relative  to  any  information  measure  through  (3.4). 

If  we  consider  refinements,  of  our  distributions,  as  before,  and  realize 
that  such  a  refinement  is  also  a  refinement  of  the  information  measure, 
then  we  can  prove  a  relation  analogous  to  Theorem  2: 


THEOREM  4.  The  information  of  a  distribution  relative  to  a  given  informa¬ 
tion  measure  never  decreases  under  refinement.  (Proof  in  Appendix  I.) 


Therefore,  just  as  for  correlation,  we  can  define  the  information  of  a 
probability  measure  Mp  on  the  cartesian  product  of  arbitrary  sets 
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relative  to  the  information  measures  Mx*  ^Y’"*’ ^Z’  on  the 
individual  sets,  by  considering  finite  partitions  9  into  subsets  {3C- i, 
i^j},...,{2>kl,  for  which  we  take  as  the  definition  of  the  information: 


(5.4) 


r? 

‘XY...Z 


2  mp 

ij...k 


. Zk) 

k>  ln 


V  9 

Then  I^y  z  *s’  as  was  lX,Y,...,Zr  ,  a  monotone  function  upon  the 
directed  set  of  partitions  (by  Theorem  4),  and  as  before  we  take  the 
directed  set  limit  for  our  definition: 


(5.5) 


9 

!xy...z  =  llm  !xy...z 


9 

SJ  ^XY...Z 


which  is  then  the  information  relative  to  the  information  measures 
f*X> 

Now,  for  functions  f,  g  on  a  directed  set  the  existence  of  lim  f  and 
lim  g  is  a  sufficient  condition  for  the  existence  of  lim(f+g),  which  is 
then  lim  f  +  lim  g,  provided  that  this  is  not  indeterminate.  Therefore: 

Theorem  5.  {X,...,Yl  =  lim  {X,...,Yl^  =  lim 
lx  y  —  *X  —  ~  *Y  ’  w^ere  information  is  taken  relative  to  any  in¬ 

formation  measure  for  which  the  expression  is  not  indeterminate.  It  is 
sufficient  for  the  validity  of  the  above  expression  that  the  basic  measures 
p-£,...,PY  be  such  that  none  of  the  marginal  informations  shall 

be  positively  infinite. 


r,  9  s  jsn  _ 

I/X...Y  JX  lYJ  “ 


The  latter  statement  holds  since,  because  of  the  general  relation 

*X  y  =  *X  +  "•  +  *Y>  determinateness  of  the  expression  is  guaranteed 

so  long  as  all  of  the  Ijj,...,Iy  are  <  +<»  . 

Henceforth,  unless  otherwise  noted,  we  shall  understand  that  informa¬ 
tion  is  to  be  computed  with  respect  to  the  uniform  measure  for  discrete 
distributions,  and  Lebesgue  measure  for  continuous  distributions  over  real 
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numbers.  In  case  of  a  mixed  distribution,  with  a  continuous  density 
P(x,y,...,z)  plus  discrete  “lumps”  P'(Xj,yj,---,zk),  we  shall  understand 
the  information  measure  to  be  the  uniform  measure  over  the  discrete  range, 
and  Lebesgue  measure  over  the  continuous  range.  These  conventions 
then  lead  us  to  the  expressions: 


(5.6)  IxY...Z 


^  P(Xj,yj,...,zk)ln  P(xj,yj,...,zk)  >  (discrete) 
ij...k 

J  P(x,y,...,z) In  P(x,y,...,z)dxdy...dz|(cont.) 
^  P^x^-.-.z^ln  P(xi,...,zk)  ■) 

f 


i"'k  V  (mixed) 

+  (  P(x,...,z)ln  P(x,...,z)dx...dz' 


(unless  otherwise  noted) 


The  mixed  case  occurs  often  in  quantum  mechanics,  for  quantities 
which  have  both  a  discrete  and  continuous  spectrum. 

§6.  Example:  Information  decay  in  stochastic  processes 

As  an  example  illustrating  the  usefulness  of  the  concept  of  relative 
information  we  shall  consider  briefly  stochastic  processes.6  Suppose  that 

n 

we  have  a  stationary  Markov  process  with  a  finite  number  of  states  Sj, 
and  that  the  process  occurs  at  discrete  (integral)  times  1,2,.. ,,n,...,  at 
which  times  the  transition  probability  from  the  state  Sj  to  the  state  Sj 
is  Tjj.  The  probabilities  T-j  then  form  what  is  called  a  stochastic 


6  See  Feller  [lo],  or  Doob  [6], 

7 

A  Markov  process  is  a  stochastic  process  whose  future  development  depends 
only  upon  its  present  state,  and  not  on  its  past  history. 
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matrix ,  i.e.,  the  elements  are  between  0  and  1,  and  T*j  =  1  for  all 

i 

i.  If  at  any  time  k  the  probability  distribution  over  the  states  is  iP^f 
then  at  the  next  time  the  probabilities  will  be  pjc+1  =  P^T^. 


In  the  special  case  where  the  matrix  is  doubly-stochastic,  which 
means  that  jTjj,  as  well  as  ^P  jT-,  equals  unity,  and  which  amounts 

to  a  principle  of  detailed  balancing  holding,  it  is  known  that  the  entropy 
of  a  probability  distribution  over  the  states,  defined  as  H  =  —  ^P  jPj  In  Pj, 

is  a  monotone  increasing  function  of  the  time.  This  entropy  is,  however, 
simply  the  negative  of  the  information  relative  to  the  uniform  measure. 

One  can  extend  this  result  to  more  general  stochastic  processes  only 
if  one  uses  the  more  general  definition  of  relative  information.  For  an 
arbitrary  stationary  process  the  choice  of  an  information  measure  which  is 
stationary,  i.e.,  for  which 


(6.1) 


aj  =  SiaiTij  (a11  i) 


leads  to  the  desired  result.  In  this  case  the  relative  information, 


(6.2) 


is  a  monotone  decreasing  function  of  time  and  constitutes  a  suitable 
basis  for  the  definition  of  the  entropy  H  =  —I.  Note  that  this  definition 
leads  to  the  previous  result  for  doubly-stochastic  processes,  since  the 
uniform  measure,  aj  =  1  (all  i),  is  obviously  stationary  in  this  case. 

One  can  furthermore  drop  the  requirement  that  the  stochastic  process 
be  stationary,  and  even  allow  that  there  are  completely  different  sets  of 
states,  IS?},  at  each  time  n,  so  that  the  process  is  now  given  by  a  se¬ 
quence  of  matrices  T-j  representing  the  transition  probability  at  time  n 
from  state  S?  to  state  S?+1.  In  this  case  probability  distributions 
change  according  to: 
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(6.3) 


pn+1  _  ^  pn*pii 

j  "  z1 1  y ' 


If  we  then  choose  any  time-dependent  information  measure  which  satisfies 
the  relations: 


(6.4)  a?+1  =  (all  j,n)  , 


then  the  information  of  a  probability  distribution  is  again  monotone  de¬ 
creasing  with  time,  (Proof  in  Appendix  I.) 

All  of  these  results  are  easily  extended  to  the  continuous  case,  and 
we  see  that  the  concept  of  relative  information  allows  us  to  define  entropy 
for  quite  general  stochastic  processes. 

§7.  Example:  Conservation  of  information  in  classical  mechanics 

As  a  second  illustrative  example  we  consider  briefly  the  classical 
mechanics  of  a  group  of  particles.  The  system  at  any  instant  is  repre¬ 
sented  by  a  point,  (x1  ,y*  .z1  ,p*,p*  ,p*  ,...,xn,yn,zn,p£,Py,p£),  in  the  phase 
space  of  all  position  and  momentum  coordinates.  The  natural  motion  of 
the  system  then  carries  each  point  into  another,  defining  a  continuous 
transformation  of  the  phase  space  into  itself.  According  to  Liouville’s 
theorem  the  measure  of  a  set  of  points  of  the  phase  space  is  invariant 
under  this  transformation.®  This  invariance  of  measure  implies  that  if  we 
begin  with  a  probability  distribution  over  the  phase  space,  rather  than  a 
single  point,  the  total  information 


(7.1) 


notal 


lxlylzlplp£pl 


xnYnznpnpnpn 


which  is  the  information  of  the  joint  distribution  for  all  positions  and 
momenta,  remains  constant  in  time. 


8 


See  Khinchin  [l6],  p.  15. 
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In  order  to  see  that  the  total  information  is  conserved,  consider  any 
partition  9  of  the  phase  space  at  one  time,  tQ,  with  its  information 
relative  to  the  phase  space  measure,  I  (tQ).  At  a  later  time  tj  a  parti¬ 
tion  9',  into  the  image  sets  of  9  under  the  mapping  of  the  space  into 
itself,  is  induced,  for  which  the  probabilities  for  the  sets  of  9  are  the 
same  as  those  of  the  corresponding  sets  of  9,  and  furthermore  for  which 
the  measures  are  the  same,  by  Liouville’s  theorem.  Thus  corresponding 

to  each  partition  9  at  time  tQ  with  information  (tQ),  there  is  a  parti- 

,  <p' 

tion  9  at  time  t1  with  information  I  (tj),  which  is  the  same: 


(7.2) 


Iy(t,)  =  Iy(t0)  • 


Due  to  the  correspondence  of  the  9’s  and  9  ’s  the  supremums  of  each 
over  all  partitions  must  be  equal,  and  by  (5.5)  we  have  proved  that 

(7-3)  WV  =  WV  * 


and  the  total  information  is  conserved. 

Now  it  is  known  that  the  individual  (marginal)  position  and  momentum 
distributions  tend  to  decay,  except  for  rare  fluctuations,  into  the  uniform 
and  Maxwellian  distributions  respectively,  for  which  the  classical  entropy 
is  a  maximum.  This  entropy  is,  however,  except  for  the  factor  of  Boltz- 
man’s  constant,  simply  the  negative  of  the  marginal  information 

<7-4>  Marginal  =  %  +  IY1  +  lZx  +  -  +  !P£  +  !Pj  +  lP £  ' 

which  thus  tends  towards  a  minimum.  But  this  decay  of  marginal  informa¬ 
tion  is  exactly  compensated  by  an  increase  of  the  total  correlation  informa¬ 
tion 

(7.5)  f total!  =  It0tai  —  Marginal  ’ 

since  the  total  information  remains  constant.  Therefore,  if  one  were  to 
define  the  total  entropy  to  be  the  negative  of  the  total  information,  one 
could  replace  the  usual  second  law  of  thermodynamics  by  a  law  of 
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conservation  of  total  entropy,  where  the  increase  in  the  standard  (marginal) 
entropy  is  exactly  compensated  by  a  (negative)  correlation  entropy.  The 
usual  second  law  then  results  simply  from  our  renunciation  of  all  correla¬ 
tion  knowledge  ( stosszahlansatz ),  and  not  from  any  intrinsic  behavior  of 
classical  systems.  The  situation  for  classical  mechanics  is  thus  in  sharp 
contrast  to  that  of  stochastic  processes,  which  are  intrinsically  irreversible. 


III.  QUANTUM  MECHANICS 


Having  mathematically  formulated  the  ideas  of  information  and  correla¬ 
tion  for  probability  distributions,  we  turn  to  the  field  of  quantum  mechanics. 
In  this  chapter  we  assume  that  the  states  of  physical  systems  are  repre¬ 
sented  by  points  in  a  Hilbert  space,  and  that  the  time  dependence  of  the 
state  of  an  isolated  system  is  governed  by  a  linear  wave  equation. 

It  is  well  known  that  state  functions  lead  to  distributions  over  eigen¬ 
values  of  Hermitian  operators  (square  amplitudes  of  the  expansion  coeffi¬ 
cients  of  the  state  in  terms  of  the  basis  consisting  of  eigenfunctions  of 
the  operator)  which  have  the  mathematical  properties  of  probability  distri¬ 
butions  (non-negative  and  normalized).  The  standard  interpretation  of 
quantum  mechanics  regards  these  distributions  as  actually  giving  the 
probabilities  that  the  various  eigenvalues  of  the  operator  will  be  observed, 
when  a  measurement  represented  by  the  operator  is  performed. 

A  feature  of  great  importance  to  our  interpretation  is  the  fact  that  a 
state  function  of  a  composite  system  leads  to  joint  distributions  over  sub¬ 
system  quantities,  rather  than  independent  subsystem  distributions,  i.e., 
the  quantities  in  different  subsystems  may  be  correlated  with  one  another. 
The  first  section  of  this  chapter  is  accordingly  devoted  to  the  development 
of  the  formalism  of  composite  systems,  and  the  connection  of  composite 
system  states  and  their  derived  joint  distributions  with  the  various  possible 
subsystem  conditional  and  marginal  distributions.  We  shall  see  that  there 
exist  relative  state  functions  which  correctly  give  the  conditional  distri¬ 
butions  for  all  subsystem  operators,  while  marginal  distributions  can  not 
generally  be  represented  by  state  functions,  but  only  by  density  matrices. 

In  Section  2  the  concepts  of  information  and  correlation,  developed 
in  the  preceding  chapter,  are  applied  to  quantum  mechanics,  by  defining 
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information  and  correlation  for  operators  on  systems  with  prescribed 
states.  It  is  also  shown  that  for  composite  systems  there  exists  a  quantity 
which  can  be  thought  of  as  the  fundamental  correlation  between  subsys¬ 
tems,  and  a  closely  related  canonical  representation  of  the  composite  sys¬ 
tem  state.  In  addition,  a  stronger  form  of  the  uncertainty  principle,  phrased 
in  information  language,  is  indicated. 

The  third  section  takes  up  the  question  of  measurement  in  quantum 
mechanics,  viewed  as  a  correlation  producing  interaction  between  physical 
systems.  A  simple  example  of  such  a  measurement  is  given  and  discussed. 
Finally  some  general  consequences  of  the  superposition  principle  are  con¬ 
sidered. 

It  is  convenient  at  this  point  to  introduce  some  notational  conventions. 
We  shall  be  concerned  with  points  tfi  in  a  Hilbert  space  K,  with  scalar 
product  (^j,^2).  A  state  is  a  point  ijt  for  which  {xjj ,xfj)  =  1.  For  any 
linear  operator  A  we  define  a  functional,  <  A>i/r,  called  the  expectation 
of  A  for  ifj ,  to  be: 

<A >i[r  =  (lA.Ai/r)  . 


A  class  of  operators  of  particular  interest  is  the  class  of  projection  opera¬ 
tors.  The  operator  [<£],  called  the  projection  on  cf>,  is  defined  through: 

[<£fyr  =  . 

For  a  complete  orthonormal  set  and  a  state  if/  we  define  a 

square-amplitude  distribution,  Pj,  called  the  distribution  of  ift  over 
i0jl  through: 

Pi  =  =  <[0 iW  . 


In  the  probabilistic  interpretation  this  distribution  represents  the  proba¬ 
bility  distribution  over  the  results  of  a  measurement  with  eigenstates  <£j, 
performed  upon  a  system  in  the  state  i//.  (Hereafter  when  referring  to  the 
probabilistic  interpretation  we  shall  say  briefly  “the  probability  that  the 
system  will  be  found  in  <£j”,  rather  than  the  more  cumbersome  phrase 
“the  probability  that  the  measurement  of  a  quantity  B,  with  eigenfunc- 
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tions  shall  yield  the  eigenvalue  corresponding  to  <£j,”  which  is 

meant.) 

For  two  Hilbert  spaces  K1  and  H2,  we  form  the  direct  product  Hil¬ 
bert  space  K3  =  H1  ®  K2  (tensor  product)  which  is  taken  to  be  the  space 
of  all  possible1  sums  of  formal  products  of  points  of  and  K2,  i.e., 
the  elements  of  K3  are  those  of  the  form  where  e  Kj  and 

r]  •  f  H2 .  The  scalar  product  in  K3  is  taken  to  be  j*j)- 

^  a*bj(^j,^j)(77£,  J7j).  It  is  then  easily  seen  that  if  and  {rjjl  form 
ij 

complete  orthonormal  sets  in  Kj  and  K2  respectively,  then  the  set  of 
all  formal  products  is  a  complete  orthonormal  set  in  K3.  For  any 

pair  of  operators  A,  B,  in  Hj  and  H2  there  corresponds  an  operator 
C  =  A®B,  the  direct  product  of  A  and  B,  in  H3,  which  can  be  defined 
by  its  effect  on  the  elements  ^ 77 j  of  H3: 

=  A®BfiI?j  =  (Afj)(B^)  . 

§1.  Composite  systems 

It  is  well  known  that  if  the  states  of  a  pair  of  systems  Sj  and  S2, 
are  represented  by  points  in  Hilbert  spaces  Hj  and  H2  respectively, 
then  the  states  of  the  composite  system  S  =  Sj  +  S2  (the  two  systems 
S1  and  S2  regarded  as  a  single  system  S)  are  represented  correctly  by 
points  of  the  direct  product  Hj  ®  K2>  This  fact  has  far  reaching  conse¬ 
quences  which  we  wish  to  investigate  in  some  detail.  Thus  if  Ifji  is  a 
complete  orthonormal  set  for  Kj,  and  i^}  for  H2,  the  general  state  of 
S  =  Sj  +  S2  has  the  form: 

(i.D  <AS=2aij^j  (2aijaij  =  1)- 

ij  '  ij  ' 


More  rigorously,  one  considers  only  finite  sums, 
space  to  arrive  at  ®  Hj. 


then  completes  the  resulting 
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In  this  case  we  shall  call  P-  =  a*j  a •  j  the  joint  square-amplitude  distri¬ 
bution  of  over  ifji  and  i^ji.  In  the  standard  probabilistic  interpre¬ 
tation  a*jajj  represents  the  joint  probability  that  Sj  will  be  found  in 
the  state  j  and  S2  will  be  found  in  the  state  Following  the  proba- 

C 

bilistic  model  we  now  derive  some  distributions  from  the  state  t/r  .  Let 
A  be  a  Hermitian  operator  in  Sj  with  eigenfunctions  <f>y  and  eigen¬ 
values  AJ(  and  B  an  operator  in  S„  with  eigenfunctions  0;  and  eigen- 

1  S  J 

values  pj.  Then  the  joint  distribution  of  t/f  over  i<£j!  and  Pjj, 

is: 


(1.2) 


Pjj  =  P(<£i  and  0j)  =  |(*i*j,*S>r 


The  marginal  distributions,  of  if/  over  {<^-1  and  of  if/  over  l^jl, 

are: 

(1.3)  Pj  =  P(^)  =  2  Pij  =  2  l(^i0j'^S)|2  ' 

j  j 

i  i 

and  the  conditional  distributions  P-  and  P*  are: 

P-. 

(1.4)  p]  =  P(0j  conditioned  on  0j)  =  , 

P-. 

pj  =  P(0j  conditioned  on  0j)  =  -py  . 


We  now  define  the  conditional  expectation  of  an  operator  A  on  S. , 

0*  A 

conditioned  on  6 j  in  S2,  denoted  by  Exp  1  [A],  to  be: 

Exp0i[A]  =  ^AjPj  =  (l/Pj)2PijAi 

i  i 

=  d/Pj)  S  AiK^i0j^s)l2 

i 

=  (1/Pj)  2  |(^i0j,^S)|2(^i,A9Si)  , 


(1.5) 
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and  we  define  the  marginal  expectation  of  A  on  to  be: 

(1.6)  Exp  [A]  =  2  PiAi  =  2  xipij  =  2  K0i  0^S)|  (0 i- A0j) 

i  ij  ij 

We  shall  now  introduce  projection  operators  to  get  more  convenient 
forms  of  the  conditional  and  marginal  expectations,  which  will  also  exhibit 
more  clearly  the  degree  of  dependence  of  these  quantities  upon  the  chosen 
basis  {0^0-1.  Let  the  operators  [<£^3  and  [0-3  be  the  projections  on 
</>j  in  Sj  and  (f> •  in  S2  respectively,  and  let  I  and  I  be  the  identi- 

J  s 

ty  operators  in  and  S2.  Then,  making  use  of  the  identity  ifr  = 

2  for  any  complete  orthonormal  set  we  have: 

ij 

(1.7)  <  [tf ,]  [9j]  >  -  »S,  t-^j]  10,1  <!>*)  - 

\  mn  / 

=  2  ^k^'^S)*^m0n'^S>Skm%nSimSjn 

kf!mn 

=  (0i0j,<As)*(«Ai0j^s)  =  pij  . 

Q 

so  that  the  joint  distribution  is  given  simply  by  <[<£j3[<£j]><A  . 

For  the  marginal  distribution  we  have: 

(1.8)  p. = 2,  Pjj = 2  <[^i]  = <t<^(2  [<?i]Ws = <^i3i2>^s . 

j  j  '  j  ' 

and  we  see  that  the  marginal  distribution  over  the  is  independent  of 
the  set  |0j|  chosen  in  S2.  This  result  has  the  consequence  in  the  ordi¬ 
nary  interpretation  that  the  expected  outcome  of  measurement  in  one  sub¬ 
system  of  a  composite  system  is  not  influenced  by  the  choice  of  quantity 
to  be  measured  in  the  other  subsystem.  This  expectation  is,  in  fact,  the 
expectation  for  the  case  in  which  no  measurement  at  all  (identity  operator) 
is  performed  in  the  other  subsystem.  Thus  no  measurement  in  S2  can 
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affect  the  expected  outcome  of  a  measurement  in  S1,  so  long  as  the  re¬ 
sult  of  any  S2  measurement  remains  unknown.  The  case  is  quite  different, 
however,  if  this  result  is  known,  and  we  must  turn  to  the  conditional  dis¬ 
tributions  and  expectations  in  such  a  case. 

We  now  introduce  the  concept  of  a  relative  state-function,  which  will 
play  a  central  role  in  our  interpretation  of  pure  wave  mechanics.  Consider 

s 

a  composite  system  S  =  +  S2  in  the  state  4s  .  To  every  state  17  of 

S2  we  associate  a  state  of  Sj ,  4,1^ey  called  the  relative  state  in  Sj  for 
rj  in  S2,  through: 

(1.9)  Definition.  4 =  N  ^  (<f>iT),4rS)4>i  , 


where  (<,6-!  is  any  complete  orthonormal  set  in  Sj  and  N  is  a  normali¬ 
zation  constant.2 


The  first  property  of  is  its  uniqueness,3  i.e.,  its  dependence 

upon  the  choice  of  the  basis  1*^1  is  only  apparent.  To  prove  this,  choose 

another  basis  l£k!,  with  ^  bik^k’  Then  ]^bij  bik  =  *jk'  and: 

k  i 



S(Sbiifi’’',’s)(2bikfk) 

S  (  S  b*i  bik)  (fi  'jS)  fk  ■  S  8jk 

jk  '  i  '  jk 



=  2(fkr?’^S^k  • 


The  second  property  of  the  relative  state,  which  justifies  its  name,  is 
O' 

that  i//rgj  correctly  gives  the  conditional  expectations  of  all  operators  in 
Sj,  conditioned  by  the  state  0j  in  S2.  As  before  let  A  be  an  operator 
in  S1  with  eigenstates  <£•  and  eigenvalues  Aj.  Then: 


2  In  case  r?,  4^1 4> j  =  ®  (unnormalizable)  then  choose  any  function  for  the 

relative  function.  This  ambiguity  has  no  consequences  of  any  importance  to  us. 
See  in  this  connection  the  remarks  on  p.  40. 

3  Except  if  4,^)4)i  =  There  is  still,  of  course,  no  dependence  upon 

the  basis. 
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6,  /  Q,  d-  \ 

(1.10)  <A>^1=(^1,A^1) 

=  (N 

'  i  im  ' 

im 

-  «22'lipij  ■ 


At  this  point  the  normalizer  N2  can  be  conveniently  evaluated  by  using 

/V 

(1.10)  to  compute:  <1*  >¥  ^  =  N2  ^  1  Pjj  =  N2Pj  =  1,  so  that 


(1.11) 


hr  =  1/Pj  . 


Substitution  of  (1.11)  in  (1.10)  yields: 

0  6 
(1.12)  <A>^ ij  =  (1/Pj)  2  AjP^  =  £  XjPj  =  Exp  i  [A]  , 

i  i 

and  we  see  that  the  conditional  expectations  of  operators  are  given  by  the 
relative  states.  (This  includes,  of  course,  the  conditional  distributions 
themselves,  since  they  may  be  obtained  as  expectations  of  projection 
operators.) 

C 

An  important  representation  of  a  composite  system  state  </r  ,  in  terms 

of  an  orthonormal  set  {0;1  in  one  subsystem  S-  and  the  set  of  relative 
/  Q.  'J  J  *• 

states  in  is: 


(1.13) 


*s  -  £  i«j  -2 

ij  j  '  i  ' 

-  2  in  h  2<*i<’j.*s>*il<’j 

j  J  L  i  J 

=  SfT^!ll0j  ’  where  l/N?  =  Pj  =  <llt0j]>^S 
i  i 
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Thus,  for  any  orthonormal  set  in  one  subsystem,  the  state  of  the  composite 

system  is  a  single  superposition  of  elements  consisting  of  a  state  of  the 

given  set  and  its  relative  state  in  the  other  subsystem.  (The  relative 

states,  however,  are  not  necessarily  orthogonal.)  We  notice  further  that  a 

0 ' 

particular  element,  if/  J  0-,  is  quite  independent  of  the  choice  of  basis 

id  J  0  . 

i0k!,  k/j,  for  the  orthogonal  space  of  9y  since  ^  ^  depends  only  on 
d ■  and  not  on  the  other  6 k  for  k  ^  j.  We  remark  at  this  point  that  the 

J  ^  C 

ambiguity  in  the  relative  state  which  arises  when  ^  )<£j  =  0 


(see  p.  38)  is  unimportant  for  this  representation,  since  although  any 
O' 

state  if/  \  can  be  regarded  as  the  relative  state  in  this  case,  the  term 
Q .  rel 

if/  will  occur  in  (1.13)  with  coefficient  zero. 

Now  that  we  have  found  subsystem  states  which  correctly  give  condi¬ 
tional  expectations,  we  might  inquire  whether  there  exist  subsystem  states 
which  give  marginal  expectations.  The  answer  is,  unfortunately,  no.  Let 
us  compute  the  marginal  expectation  of  A  in  Sj  using  the  representa¬ 
tion  (1.13): 

(1.14)  Exp  [A]  =  <A  I2>^ 


-  Sra:('/,rei'A'A«i)sjk 

jk  J  K 

■  2  pj  <A>'i 


"j 

rel 


J  J 


Now  suppose  that  there  exists  a  state  in  Sj ,  if/',  which  correctly  gives 
the  marginal  expectation  (1.14)  for  all  operators  A  (i.e.,  such  that 
Exp  [A]  =  <  A >if/'  for  all  A).  One  such  operator  is  {if/'],  the  projection 
on  if/',  for  which  <\if/']>if/'  =  1.  But,  from  (1.14)  we  have  tha^  Exp  = 
,  which  is  <  1  unless,  for  all  j ,  P j  =  0  or  =  if/',  a 
j 

condition  which  is  not  generally  true.  Therefore  there  exists  in  general 
no  state  for  S1  which  correctly  gives  the  marginal  expectations  for  all 
operators  in  Sj. 
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However,  even  though  there  is  generally  no  single  state  describing 

marginal  expectations,  we  see  that  there  is  always  a  mixture  of  states, 

O' 

namely  the  states  weighted  with  Pj,  which  does  yield  the  correct 

expectations.  The  distinction  between  a  mixture,  M,  of  states  <f> j, 
weighted  by  Pj,  and  a  pure  state  </r  which  is  a  superposition,  iff  = 

0^,  is  that  there  are  no  interference  phenomena  between  the  various 

states  of  a  mixture.  The  expectation  of  an  operator  A  for  the  mixture  is 
Exp^[A]  =  P|<  A>0|  =  ^  P^j,  A<£j),  while  the  expectation  for  the 

i  i 

pure  state  tfr  is  <A></r=^^  a^^A  aj^  =  ^VjC^A^j), 

which  is  not  the  same  as  that  of  the  mixture  with  weights  Pj  =  a*aj,  due 
to  the  presence  of  the  interference  terms  (<£•,  A j)  for  j  ^  i. 

It  is  convenient  to  represent  such  a  mixture  by  a  density  matrix,4  p. 

If  the  mixture  consists  of  the  states  weighted  by  Pj,  and  if  we  are 
working  in  a  basis  consisting  of  the  complete  orthonormal  set  1  <£j!,  where 
i/fj  =  ^  a]^i>  then  we  define  the  elements  of  the  density  matrix  for  the 


mixture  to  be: 

(1-15) 


Pk£  =  S  Pi  ak  (aj“  (*i**j» 


Then  if  A  is  any  operator,  with  matrix  representation  Aj£  =  (0j,  Ac£j?) 
in  the  chosen  basis,  its  expectation  for  the  mixture  is: 

(1.16)  ExpM[A]  =  ^  A^)  =  2  Pj  [S  aj* a|(^,  A0E)j 


=  Trace  (p  A) 


]*aA(0i,A0E)  =  2  PfiAtf 
/  i.f 


4 


Also  called  a  statistical  operator  (von  Neumann  [  1 7 ]). 
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Therefore  any  mixture  is  adequately  represented  by  a  density  matrix.5 
Note  also  that  pj^  =  p^,  so  that  p  is  Hermitian. 

1  9 

Let  us  now  find  the  density  matrices  p  and  p  for  the  subsystems 

Q 

Sj  and  S2  of  a  system  S  =  S1+S2  in  the  state  t/j  .  Furthermore,  let 
us  choose  the  orthonormal  bases  ifjl  and  {r/j!  in  Sj  and  S2  respec¬ 
tively,  and  let  A  be  an  operator  in  Sj,  B  an  operator  in  S2-  Then: 


(1.17)  Exp  [A]  =  <  AI2  > 


2  v.  ,/,S 


\  ij  £m  / 


=  2  (^S) 
ij£m 

-  2  [2  «ft.  *?e> 

=  Trace  (p1A)  , 

where  we  have  defined  p1  in  the  basis  to  be: 

(1-18)  Pgj  =  (£lVy'!'S)  • 

j 

In  a  similar  fashion  we  find  that  p2  is  given,  in  the  Iqji  basis,  by: 
d- !9)  P2  n  =  ^  ^i  V  ^S>  ^i  ^m’  ^S>  ’ 

i 

It  can  be  easily  shown  that  here  again  the  dependence  of  p1  upon  the 
choice  of  basis  l?^-!  in  S2,  and  of  p2  upon  {£•},  is  only  apparent. 


A  better,  coordinate  free  representation  of  a  mixture  is  in  terms  of  the  opera¬ 
tor  which  the  density  matrix  represents.  For  a  mixture  of  states  lj/n  (not  neces¬ 
sarily  orthogonal)  with  weights  pn>  the  density  operator  is  p  =  2pn[^n],  where 
[^n]  stands  for  the  projection  operator  on 
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In  summary,  we  have  seen  in  this  section  that  a  state  of  a  composite 
system  leads  to  joint  distributions  over  subsystem  quantities  which  are 
generally  not  independent.  Conditional  distributions  and  expectations  for 
subsystems  are  obtained  from  relative  states,  and  subsystem  marginal 
distributions  and  expectations  are  given  by  density  matrices. 

There  does  not,  in  general,  exist  anything  like  a  single  state  for  one 
subsystem  of  a  composite  system.  That  is,  subsystems  do  not  possess 
states  independent  of  the  states  of  the  remainder  of  the  system,  so  that 
the  subsystem  states  are  generally  correlated.  One  can  arbitrarily  choose 
a  state  for  one  subsystem,  and  be  led  to  the  relative  state  for  the  other 
subsystem.  Thus  we  are  faced  with  a  fundamental  relativity  of  states, 
which  is  implied  by  the  formalism  of  composite  systems.  It  is  meaning¬ 
less  to  ask  the  absolute  state  of  a  subsystem  —  one  can  only  ask  the 
state  relative  to  a  given  state  of  the  remainder  of  the  system. 

§2.  Information  and  correlation  in  quantum  mechanics 

We  wish  to  be  able  to  discuss  information  and  correlation  for  Hermi- 
tian  operators  A,  B,...,  with  respect  to  a  state  function  These 
quantities  are  to  be  computed,  through  the  formulas  of  the  preceding 
chapter,  from  the  square  amplitudes  of  the  coefficients  of  the  expansion 
of  in  terms  of  the  eigenstates  of  the  operators. 

We  have  already  seen  (p.  34)  that  a  state  ^  and  an  orthonormal  basis 
{</>i }  leads  to  a  square  amplitude  distribution  of  ft  over  the  set  ! : 

(2.1)  Pj  =  10^,  M2  =  <foil - 

so  that  we  can  define  the  information  of  the  basis  {t^S  for  the  state  i//, 
jO/O,  to  be  simply  the  information  of  this  distribution  relative  to  the 

uniform  measure: 

=  2  Pi  ln  Pi  =  2  i^i’^l2  ln  K*i’*>l2  • 

i  i 



(2.2) 
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We  define  the  information  of  an  operator  A,  for  the  state  p,  I^OA), 
to  be  the  information  in  the  square  amplitude  distribution  over  its  eigen¬ 
values,  i.e.,  the  information  of  the  probability  distribution  over  the  results 
of  a  determination  of  A  which  is  prescribed  in  the  probabilistic  interpre¬ 
tation.  For  a  non-degenerate  operator  A  this  distribution  is  the  same  as 
the  distribution  (2.1)  over  the  eigenstates.  But  because  the  information 
is  dependent  only  on  the  distribution,  and  not  on  numerical  values,  the 
information  of  the  distribution  over  eigenvalues  of  A  is  precisely  the 
information  of  the  eigenbasis  of  A,l<£ji.  Therefore: 

(2.3)  I^OA)  =  l|0.j(^)  =  ^  <\fp-^>P  In  < [<£ j] > iff  (A  non-degenerate)  . 

i 

We  see  that  for  fixed  ip,  the  information  of  all  non-degenerate  operators 
having  the  same  set  of  eigenstates  is  the  same. 

In  the  case  of  degenerate  operators  it  will  be  convenient  to  take,  as 
the  definition  of  information,  the  information  of  the  square  amplitude  dis¬ 
tribution  over  the  eigenvalues  relative  to  the  information  measure  which 
consists  of  the  multiplicity  of  the  eigenvalues,  rather  than  the  uniform 
measure.  This  definition  preserves  the  choice  of  uniform  measure  over 
the  eigenstates,  in  distinction  to  the  eigenvalues.  If  (j  from  1  to  mj) 
are  a  complete  orthonormal  set  of  eigenstates  for  A',  with  distinct  eigen¬ 
values  Aj  (degenerate  with  respect  to  j ),  then  the  multiplicity  of  the  i^1 
eigenvalue  is  m^  and  the  information  1^,  (ip)  is  defined  to  be: 

(2.4)  IA'«A)=  2(S<[*ij1:>^)ln  - - 5q -  ' 

The  usefulness  of  this  definition  lies  in  the  fact  that  any  operator  A" 
which  distinguishes  further  between  any  of  the  degenerate  states  of  A' 
leads  to  a  refinement  of  the  relative  density,  in  the  sense  of  Theorem  4, 
and  consequently  has  equal  or  greater  information.  A  non-degenerate 
operator  thus  represents  the  maximal  refinement  and  possesses  maximal 
information. 
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It  is  convenient  to  introduce  a  new  notation  for  the  projection  opera¬ 
tors  which  are  relevant  for  a  specified  operator.  As  before  let  A  have 
eigenfunctions  and  distinct  eigenvalues  Aj.  Then  define  the  projec¬ 
tions  A-,  the  projections  on  the  eigenspaces  of  different  eigenvalues  of 
A,  to  be:  m 

(2.5)  Aj  -  5?  [0ij]  • 

j  =  l 

To  each  such  projection  there  is  associated  a  number  m-,  the  multiplicity 
of  the  degeneracy,  which  is  the  dimension  of  the  i*  eigenspace.  In  this 
notation  the  distribution  over  the  eigenvalues  of  A  for  the  state  tp,  Pj, 
becomes  simply: 

(2.6)  Pj  =  P(Aj)  =  <Aj>^r  , 
and  the  information,  given  by  (2.4),  becomes: 


<A->i ft 

(2.7)  iA  =  S<Ai>^ln-^r~  • 

i 

Similarly,  for  a  pair  of  operators,  A  in  Sj  and  B  in  S2,  for  the 

C 

composite  system  S  =  S1+S2  with  state  i p  ,  the  joint  distribution  over 
eigenvalues  is: 

(2.8)  Pjj  =  P(Ai,/rj)  =  <AiBj>^S  , 
and  the  marginal  distributions  are: 


(2.9)  P,  '  S  Pij  "  <A.(I  Bi)>*S  ‘  <Ai,2>'',S  ' 

pj-Spii-<(2Ai)Bi>'jS 

The  joint  information,  I^B’  *s  8*ven  by- 

P..  <A;Bj>^S 

>AB  -  2  V°=&-  2  <AiBi >"  1 " 


(2.10) 
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where  mj  and  nj  are  the  multiplicities  of  the  eigenvalues  Aj  and  pj. 
The  marginal  information  quantities  are  given  by: 


—  ,  C  <A;I2>^S 

(2.11)  1A  .  2  <A(I2>(4S  In  1  , 

i 

± -  , 

j  J 

C 

and  finally  the  correlation,  (A,B|^r  is  given  by: 

P..  <A-B->i^^ 

(2.12)  {A,B}<£S  =  V  P-.  In  s4-  =  T  <AiB  >^S  In - U - S  . 

^  J  p-p!  ~  J  <AiI>^b<IB.>^i> 


i  J 


where  we  note  that  the  expression  does  not  involve  the  multiplicities,  as 
do  the  information  expressions,  a  circumstance  which  simply  reflects  the 
independence  of  correlation  on  any  information  measure.  These  expres¬ 
sions  of  course  generalize  trivially  to  distributions  over  more  than  two 
variables  (composite  systems  of  more  than  two  subsystems). 

In  addition  to  the  correlation  of  pairs  of  subsystem  operators,  given 
by  (2.12),  there  always  exists  a  unique  quantity  {S1,S2I>  the  canonical 
correlation,  which  has  some  special  properties  and  may  be  regarded  as 
the  fundamental  correlation  between  the  two  subsystems  Sj  and  S2  of 
the  composite  system  S.  As  we  remarked  earlier  a  density  matrix  is 
Hermitian,  so  that  there  is  a  representation  in  which  it  is  diagonal.6  In 


The  density  matrix  of  a  subsystem  always  has  a  pure  discrete  spectrum,  if 
the  composite  system  is  in  a  state.  To  see  this  we  note  that  the  choice  of  any 
orthonormal  basis  in  S2  leads  to  a  discrete  (i.e.,  denumerable)  set  of  relative 
states  in  S^.  The  density  matrix  in  Sj  then  represents  this  discrete  mixture, 

xjri.  weighted  by  P..  This  means  that  the  expectation  of  the  identity,  Exp[l]  = 
ft.  * 

2jPj(02l'  I W  =  2jPj  =  1  =  ^ace  (pD  =  Trace  (p).  Therefore  p  has  a  finite 
trace  and  is  a  completely  continuous  operator,  having  necessarily  a  pure  discrete 
spectrum.  (See  von  Neumann  [l  73,  p.  89,  footnote  115.) 
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c 

particular,  for  the  decomposition  of  S  (with  state  if/  )  into  S,  and  S,, 

Si  So 

we  can  choose  a  representation  in  which  both  p  and  p  are  diagonal. 
(This  choice  is  always  possible  because  p  is  independent  of  the  basis 
in  S2  and  vice-versa.)  Such  a  representation  will  be  called  a  canonical 
representation.  This  means  that  it  is  always  possible  to  represent  the 

s 

state  if/  by  a  single  superposition: 


(2.13) 


-  X  “i^i’i  ' 


where  both  the  |fji  and  the  {r^i  constitute  orthonormal  sets  of  states 
for  Sj  and  S2  respectively. 

To  construct  such  a  representation  choose  the  basis  |n-j  for  S0  so 
S  i  z 

that  p  2  is  diagonal: 


(2.14) 


Pif  =  Vij  , 


and  let  the  fj  be  the  relative  states  in  Sj  for  the  in  S2: 

(2.15)  =  Nj  ^(<£j7i.^S)0j  (any  basis  |<£j))  . 

j 

c 

Then,  according  to  (1.13),  if/°  is  represented  in  the  form  (2.13)  where  the 
{t/j}  are  orthonormal  by  choice,  and  the  If  ji  are  normal  since  they  are 
relative  states.  We  therefore  need  only  show  that  the  states  |fj|  are 
orthogonal: 

(2.16)  (fj,fu)  -  (n,  Y  !frS),S(,  Nk  •£ 

£m 

-  NiNkPkf  -  N*NkVkj  -  o,  to,  i^k , 
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S2 

since  we  supposed  p  to  be  diagonal  in  this  representation.  We  have 

therefore  constructed  a  canonical  representation  (2.13). 

The  density  matrix  p  is  also  automatically  diagonal,  by  the  choice 

S2 

of  representation  consisting  of  the  basis  in  S2  which  makes  p  diago¬ 
nal  and  the  corresponding  relative  states  in  Sj.  Since  are  ortho¬ 

normal  we  have: 

(2. 17)  pSl  =  ^  i  Vk,  </'S)*  (£j  %,  ^S)  = 

k 

* 

2^i^k’  Sam^m^m^  ^k'  ^  a£ fyw'j 

=  ^ama£SimSkmSj£Sk£  =  2  a*ajSki8kj 
=  afa-  8::  =  P.S.. 


'l  i  ij 



where  Pj  =  a*aj  is  the  marginal  distribution  over  the  Ifji.  Similar  com¬ 
putation  shows  that  the  elements  of  p^2  are  the  same  : 

So 


(2.18) 


Pkl  =  akakSkE  =  PkSk£ 


Thus  in  the  canonical  representation  both  density  matrices  are  diagonal 
and  have  the  same  elements,  P^,  which  give  the  marginal  square  ampli¬ 
tude  distribution  over  both  of  the  sets  ifji  and  {7/jl  forming  the  basis 
of  the  representation. 

Now,  any  pair  of  operators,  A  in  Sj  and  B  in  S2,  which  have  as 
non-degenerate  eigenfunctions  the  sets  ifji  and  (i.e.,  operators 

which  define  the  canonical  representation),  are  “perfectly”  correlated  in 
the  sense  that  there  is  a  one-one  correspondence  between  their  eigen- 

»v 

values.  The  joint  square  amplitude  distribution  for  eigenvalues  Aj  of  A 
and  pj  of  B  is: 

P(Xj  and  pj)  =  P(^  and  77 j)  =  P^  =  8*8-5-  =  PjS-  . 


(2.19) 
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i-w  O 

Therefore,  the  correlation  between  these  operators,  |A,Bii/r°  is: 

P(A 
ij 

= -  2  pi ln  pi  • 

i 

We  shall  denote  this  quantity  by  iSj,  and  call  it  the  canonical 

correlation  of  the  subsystems  Sj  and  S2  for  the  system  state  It 
is  the  correlation  between  any  pair  of  non-degenerate  subsystem  operators 
which  define  the  canonical  representation. 

In  the  canonical  representation,  where  the  density  matrices  are  diago¬ 
nal  ((2.17)  and  (2.18)),  the  canonical  correlation  is  given  by: 

S  S 

(2.21)  iS1,S2tV>S  =  -^Pi  lnPj  =  - Trace (p  1  lnp  1 ) 

i 

S  S 

=  —  Trace  (p  2  In  p  2)  . 

But  the  trace  is  invariant  for  unitary  transformations,  so  that  (2.21)  holds 
independently  of  the  representation,  and  we  have  therefore  established 

g 

the  uniqueness  of  jSj.Sjlip  . 

It  is  also  interesting  to  note  that  the  quantity  —  Trace  (p  ln  p)  is 
(apart  from  a  factor  of  Boltzman’s  constant)  just  the  entropy  of  a  mixture 

y 

of  states  characterized  by  the  density  matrix  p.  Therefore  the  entropy 

g 

of  the  mixture  characteristic  of  a  subsystem  Sj  for  the  state  </»  = 
^Sl+S2  is  exactly  matched  by  a  correlation  information  |Sj,S2i,  which 
represents  the  correlation  between  any  pair  of  operators  A,  B,  which 
define  the  canonical  representation.  The  situation  is  thus  quite  similar 

Q 

to  that  of  classical  mechanics. 

2  See  von  Neumann  [l7],  p.  296. 

8  Cf.  Chapter  II,  §7. 


(2.20)  |A,B|^S=2 


P(Ai&pj)  -  Vi, 

ln  P(Ai)P(pj)  “  2  PiSij  ln  PjPj 

1  J  lj  J 
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Another  special  property  of  the  canonical  representation  is  that  any 

§  # 

operators  A,  B  defining  a  canonical  representation  have  maximum  margi¬ 
nal  information,  in  the  sense  that  for  any  other  discrete  spectrum  opera¬ 
tors,  A  on  Sj ,  B  on  S2 ,  1^  ^  I£  and  Ig  ^  Ig.  If  the  canonical  repre¬ 
sentation  is  (2.13),  with  ifji,  Irjji  non-degenerate  eigenfunctions  of  A, 
B,  respectively,  and  A,  B  any  pair  of  non-degenerate  operators  with 
eigenfunctions  1(^1  and  ifyi,  where  ^  cik^k’  =  2  ^i£^£’ 
then  \{fi  in  <f>,6  representation  is:  k  ^ 

(2.22)  «AS  =  2  aicikdi£*k®£  =  2(2  aicikdi£Vk0£  • 

ik£  k£  '  i  ' 

and  the  joint  square  amplitude  distribution  for  0^,  is: 

P,.<7  = 


(2.23) 

while  the  marginals  are 


k£  =  (2  aiCikdi£)  =  2ai'a'"C^Cmkdi£dm£  ’ 

'  i  /  im 


(2.24) 


Pk  ~  2  Pk£  2  3 >  amcikcmk  2 
£  im  £ 

=  2<am4cmkSim  =  2  aiCik'ik  • 


and  similarly 
(2.25) 


P£  =  2  Pk£  =  2  ar  aidi£di£  • 

k  i 

Then  the  marginal  information  1^  is: 

(2.26)  IA  -  2  Pk>»  Pk  -  2  (2  a?<V4'ik)  1"  (£“?ai4'ik) 

-2  (2a*aiTik)1”(2a.f*iTik)  • 

where  =  c^Cj^  is  doubly-stochastic  (  ^  T;1,  =  =  1  follows 

i  k 

from  unitary  nature  of  the  c^).  Therefore  (by  Corollary  2,  §4,  Appendix  I): 
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(2.27)  'A  -  2  (?  •lVlk)  (2  aiaiTik) 

<  £<ai  In  »*=,  -  IJ  , 
i 

and  we  have  proved  that  A  has  maximal  marginal  information  among  the 
discrete  spectrum  operators.  Identical  proof  holds  for  B. 

While  this  result  was  proved  only  for  non-degenerate  operators,  it  is 
immediately  extended  to  the  degenerate  case,  since  as  a  consequence  of 
our  definition  of  information  for  a  degenerate  operator,  (2.4),  its  informa¬ 
tion  is  still  less  than  that  of  an  operator  which  removes  the  degeneracy. 

We  have  thus  proved: 

THEOREM.  1^  ^  I£,  where  A  is  any  non-degenerate  operator  defining 
the  canonical  representation,  and  A  is  any  operator  with  discrete  spec¬ 
trum. 

We  conclude  the  discussion  of  the  canonical  representation  by  conjec¬ 
turing  that  in  addition  to  the  maximum  marginal  information  properties  of 
A,  B,  which  define  the  representation,  they  are  also  maximally  correlated, 
by  which  we  mean  that  for  any  pair  of  operators  C  in  Sj ,  D  in  S2, 
iC,Dl<{A,Bi,  i.e.,: 

(2.28)  Conjecture.9  lC,D|i/rS  <  1A,B|^S  =  iSj,S2i^s 

for  all  C  on  Sp  D  on  S2- 

As  a  final  topic  for  this  section  we  point  out  that  the  uncertainty 
principle  can  probably  be  phrased  in  a  stronger  form  in  terms  of  informa¬ 
tion.  The  usual  form  of  this  principle  is  stated  in  terms  of  variances, 
namely: 

9  The  relations  {c,Bl  ^  Ia,!?}  —  is^,S2i  and  1a,d1  =  |Sj,S2i  for  all  C  on  Sj, 
D  on  S2>  can  be  proved  easily  in  a  manner  analogous  to  (2.27).  These  do  not, 
however,  necessarily  imply  the  general  relation  (2.28). 
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(2.29) 


44 1  * 


for  all  ^(x)  , 
where  a2  =  <x2>ifr  —  [<x>^r]2  and 

4  -  <(-i  -<(?  )2>+  -[<i>*]2  ■ 

The  conjectured  information  form  of  this  principle  is: 

(2.30)  Ix  +  Ik  =  In  (I/77  e)  for  all  i/»(x). 


Although  this  inequality  has  not  yet  been  proved  with  complete  rigor,  it 
is  made  highly  probable  by  the  circumstance  that  equality  holds  for 


of  the  form  </»(x)  =  (1/2 n)4  exponent 


the  so  called  “minimum  un¬ 


certainty  packets”  which  give  normal  distributions  for  both  position  and 
momentum,  and  that  furthermore  the  first  variation  of  (Ix  +  Ik)  vanishes 
for  such  ^(x).  (See  Appendix  I,  §6.)  Thus,  although  ln(l/77e)  has  not 
been  proved  an  absolute  maximum  of  Ix  +  lk,  it  is  at  least  a  stationary 
value. 

The  principle  (2.30)  is  stronger  than  (2.29),  since  it  implies  (2.29) 
but  is  not  implied  by  it.  To  see  that  it  implies  (2.29)  we  use  the  well 
known  fact  (easily  established  by  a  variation  calculation:  that,  for  fixed 

O 

variance  o  ,  the  distribution  of  minimum  information  is  a  normal  distribu¬ 
tion,  which  has  information  I  =  ln(l /o\/2ne).  This  gives  us  the  general 
inequality  involving  information  and  variance: 


(2.31)  I  ^  In  ( \/o\j2n  e)  (for  all  distributions)  . 


Substitution  of  (2.31)  into  (2.30)  then  yields: 

(2.32)  In  ( X/o^yJlne )  +  \n(\/o^J2ne)  =  Ix  +  Ik  =  ln(l/n-e) 

=>  (l/«Jx(7k2?re)  ^  (lAe)  =*>  o2crk  ^  ^  , 
so  that  our  principle  implies  the  standard  principle  (2.29). 
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To  show  that  (2.29)  does  not  imply  (2.30)  it  suffices  to  give  a  counter¬ 
example.  The  distributions  P(x)  =  jS(x)  +  ^-S(x— 10)  and  P(k)  =  ^-S(k)  + 
i-S(k— 10),  which  consist  simply  of  spikes  at  0  and  10,  clearly  satisfy 

(2.29) ,  while  they  both  have  infinite  information  and  thus  do  not  satisfy 

(2.30) .  Therefore  it  is  possible  to  have  arbitrarily  high  information  about 
both  x  and  k  (or  p)  and  still  satisfy  (2.13).  We  have,  then,  another 
illustration  that  information  concepts  are  more  powerful  and  more  natural 
than  the  older  measures  based  upon  variance. 

§3.  Measurement 

We  now  consider  the  question  of  measurement  in  quantum  mechanics, 
which  we  desire  to  treat  as  a  natural  process  within  the  theory  of  pure 
wave  mechanics.  From  our  point  of  view  there  is  no  fundamental  distinc¬ 
tion  between  “measuring  apparata”  and  other  physical  systems.  For  us, 
therefore,  a  measurement  is  simply  a  special  case  of  interaction  between 
physical  systems  -  an  interaction  which  has  the  property  of  correlating  a 
quantity  in  one  subsystem  with  a  quantity  in  another. 

Nearly  every  interaction  between  systems  produces  some  correlation 
however.  Suppose  that  at  some  instant  a  pair  of  systems  are  independent, 
so  that  the  composite  system  state  function  is  a  product  of  subsystem 
states  (i/r$  =  i/r^l  Then  this  condition  obviously  holds  only  instan¬ 

taneously  if  the  systems  are  interacting10—  the  independence  is  immediate¬ 
ly  destroyed  and  the  systems  become  correlated.  We  could,  then,  take  the 
position  that  the  two  interacting  systems  are  continually  “measuring  one 
another,  if  we  wished.  At  each  instant  t  we  could  put  the  composite 
system  into  canonical  representation,  and  choose  a  pair  of  operators  A(t) 


10  If  Uj  is  the  unitary  operator  generating  the  time  dependence  for  the  state 
function  of  the  composite  system  S  =  +  Sj,  so  that  =  U®  t/'g,  then  we 

shall  say  that  Sj  and  Sj  have  not  interacted  during  the  time  interval  [0,t]  if 
and  only  if  U^  is  the  direct  product  of  two  subsystem  unitary  operators,  i.e.,  if 

Uj  =  Uj  1  ®  U®2. 
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in  Sj  and  B(t)  in  S2  which  define  this  representation.  We  might  then 
reasonably  assert  that  the  quantity  A  in  Sj  is  measured  by  B  in  S2 
(or  vice-versa),  since  there  is  a  one-one  correspondence  between  their 
values. 

Such  a  viewpoint,  however,  does  not  correspond  closely  with  our  in- 
tuitive  idea  of  what  constitutes  “measurement,”  since  the  quantities  A 
and  B  which  turn  out  to  be  measured  depend  not  only  on  the  time,  but 
also  upon  the  initial  state  of  the  composite  system.  A  more  reasonable 
position  is  to  associate  the  term  “measurement”  with  a  fixed  interaction 
H  between  systems,11  and  to  define  the  “measured  quantities”  not  as 
those  quantities  A(t),  B(t)  which  are  instantaneously  canonically  corre¬ 
lated,  but  as  the  limit  of  the  instantaneous  canonical  operators  as  the  time 
goes  to  infinity,  A  ,  Bm  —  provided  that  this  limit  exists  and  is  inde¬ 
pendent  of  the  initial  state.12  In  such  a  case  we  are  able  to  associate  the 
“measured  quantities,  ”  A^,  B^,  with  the  interaction  H  independently 
of  the  actual  system  states  and  the  time.  We  can  therefore  say  that  H  is 
an  interaction  which  causes  the  quantity  A^  in  Sj  to  be  measured  by 
Bm  in  S2.  For  finite  times  of  interaction  the  measurement  is  only  ap¬ 
proximate,  approaching  exactness  as  the  time  of  interaction  increases  in¬ 
definitely. 

There  is  still  one  more  requirement  that  we  must  impose  on  an  inter¬ 
action  before  we  shall  call  it  a  measurement.  If  H  is  to  produce  a 
measurement  of  A  in  Sj  by  B  in  S2,  then  we  require  that  H  shall 


Here  H  means  the  total  Hamiltonian  of  S,  not  just  an  interaction  part. 

1 2  ^ 

Actually,  rather  than  referring  to  canonical  operators  A,  B,  which  are  not 

unique,  we  should  refer  to  the  bases  of  the  canonical  representation,  in 

and  |77j}  in  S2>  since  any  operators  A  =  XA.[f.],  B  =  2jP^[?7j],  with  the  com¬ 
pletely  arbitrary  eigenvalues  A.^,  fly  are  canonical.  The  limit  then  refers  to  the 
limit  of  the  canonical  bases,  if  it  exists  in  some  appropriate  sense.  However,  we 
shall,  for  convenience,  continue  to  represent  the  canonical  bases  by  operators. 
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never  decrease  the  information  in  the  marginal  distribution  of  A.  If  H 
is  to  produce  a  measurement  of  A  by  correlating  it  with  B,  we  expect 
that  a  knowledge  of  B  shall  give  us  more  information  about  A  than  we 
had  before  the  measurement  took  place,  since  otherwise  the  measurement 
would  be  useless.  Now,  H  might  produce  a  correlation  between  A  and 
B  by  simply  destroying  the  marginal  information  of  A,  without  improving 
the  expected  conditional  information  of  A  given  B,  so  that  a  knowledge 
of  B  would  give  us  no  more  information  about  A  than  we  possessed 
originally.  Therefore  in  order  to  be  sure  that  we  will  gain  information 
about  A  by  knowing  B,  when  B  has  become  correlated  with  A,  it  is 
necessary  that  the  marginal  information  about  A  has  not  decreased.  The 
expected  information  gain  in  this  case  is  assured  to  be  not  less  than  the 
correlation  iA,B|. 

The  restriction  that  H  shall  not  decrease  the  marginal  information 
of  A  has  the  interesting  consequence  that  the  eigenstates  of  A  will  not 

C 

be  distrubed,  i.e.,  initial  states  of  the  form  where  <f>  is  an 

eigenfunction  of  A,  must  be  transformed  after  any  time  interval  into 

O 

states  of  the  form  ^  =  <f>  rjt,  since  otherwise  the  marginal  information  of 
A,  which  was  initially  perfect,  would  be  decreased.  This  condition,  in 
turn,  is  connected  with  the  repeatability  of  measurements,  as  we  shall 
subsequently  see,  and  could  alternately  have  been  chosen  as  the  condition 
for  measurement. 

We  shall  therefore  accept  the  following  definition.  An  interaction  H 
is  a  measurement  of  A  in  Sj  by  B  in  Sj  if  H  does  not  destroy  the 
marginal  information  of  A  (equivalently:  if  H  does  not  disturb  the 
eigenstates  of  A  in  the  above  sense)  and  if  furthermore  the  correlation 
(A,Bl  increases  toward  its  maximum13  with  time. 


1 3 

The  maximum  of  {A,Bi  is  — 1^  if  A  has  only  a  discrete  spectrum,  and  °° 
if  it  has  a  continuous  spectrum. 
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We  now  illustrate  the  production  of  correlation  with  an  example  of  a 
simplified  measurement  due  to  von  Neumann.14  Suppose  that  we  have  a 
system  of  only  one  coordinate,  q,  (such  as  position  of  a  particle),  and 
an  apparatus  of  one  coordinate  r  (for  example  the  position  of  a  meter 
needle).  Further  suppose  that  they  are  initially  independent,  so  that  the 
combined  wave  function  is  =  <^(q)  q(r),  where  <f>( q)  is  the  initial 

system  wave  function,  and  r?(r)  is  the  initial  apparatus  function.  Finally 
suppose  that  the  masses  are  sufficiently  large  or  the  time  of  interaction 
sufficiently  small  that  the  kinetic  portion  of  the  energy  may  be  neglected, 
so  that  during  the  time  of  measurement  the  Hamiltonian  shall  consist  only 
of  an  interaction,  which  we  shall  take  to  be: 

(3.1)  HI  =  — ihq^. 

Then  it  is  easily  verified  that  the  state  ^[^(q.r): 

(3.2)  <Af+A(q.r)  =  0(q)>?(r-qt) . 

is  a  solution  of  the  SchrOdinger  equation 


(3.3) 


*^=hi^A 


for  the  specified  initial  conditions  at  time  t  =  0. 
Translating  (3.2)  into  square  amplitudes  we  get: 

(3.4)  Pt(q,r)  =  Pj(q)P2(r-qt)  , 

where  Pj(q)  =  <£*(q)<£(q)  ,  P2W  =  »J*(0»K0  , 

and  Pt(q,r)  =  (q,r)^f+A(q,r)  , 


14 


von  Neumann  [l7],  p.  442. 
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and  we  note  that  for  a  fixed  time,  t,  the  conditional  square  amplitude 
distribution  for  r  has  been  translated  by  an  amount  depending  upon  the 
value  of  q,  while  the  marginal  distribution  for  q  has  been  unaltered. 

We  see  thus  that  a  correlation  has  been  introduced  between  q  and  r  by 
this  interaction,  which  allows  us  to  interpret  it  as  a  measurement.  It  is 
instructive  to  see  quantitatively  how  fast  this  correlation  takes  place.  We 
note  that: 

(3.5) 


!0rW 


=  JJ*  Pt(q,r)  In  Pt(q,r)  dqdr 

P1(q)P2(r-qt)  In  Pj(q)  P2(r-qt)  dqdr 


■// 


=  i, 


P1(q)P2(nj)  In  P1(q)P2(a>)  dqdw 
(0)  , 


QR' 

so  that  the  information  of  the  joint  distribution  does  not  change.  Further¬ 
more,  since  the  marginal  distribution  for  q  is  unchanged: 


(3.6) 


Iq(0  =  Iq(0) 


and  the  only  quantity  which  can  change  is  the  marginal  information,  Ip, 
of  r,  whose  distribution  is: 


(3.7) 


pt(r)  =  J  Pt(r,q)dq  =  J  P1(q)P2(r-qt)dq  . 


Application  of  a  special  inequality  (proved  in  §5,  Appendix  I)  to  (3.7) 
yields  the  relation: 


(3.8) 


IR(t)  *  Iq(0)  -  in  t  , 


so  that,  except  for  the  additive  constant  Iq(0),  the  marginal  information 
Ip  tends  to  decrease  at  least  as  fast  as  In  t  with  time  during  the  inter¬ 
action.  This  implies  the  relation  for  the  correlation: 
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(3.9)  iQ,R!t  =  IQR(t)  -  IQ(t)  -  IR(t)  >  IRQ(t)  -  IQ(t)  -  IQ(0)  +  In  t  . 

But  at  t  =  0  the  distributions  for  R  and  Q  were  independent,  so  that 
Irq(O)  =  IrW  +  Substitution  of  this  relation,  (3.5),  and  (3.6)  into 

(3.9)  then  yields  the  final  result: 

(3.10)  {Q,Rlt  >  IR(0)  -  IQ(0)  +  In  t  . 

Therefore  the  correlation  is  built  up  at  least  as  fast  as  In  t,  except  for 
an  additive  constant  representing  the  difference  of  the  information  of  the 
initial  distributions  P2(r)  and  P1(q).  Since  the  correlation  goes  to  in¬ 
finity  with  increasing  time,  and  the  marginal  system  distribution  is  not 
changed,  the  interaction  (3.1)  satisfies  our  definition  of  a  measurement  of 
9  by  r. 

Even  though  the  apparatus  does  not  indicate  any  definite  system  value 
(since  there  are  no  independent  system  or  apparatus  states),  one  can 
nevertheless  look  upon  the  total  wave  function  (3.2)  as  a  superposition  of 
pairs  of  subsystem  states,  each  element  of  which  has  a  definite  q  value 
and  a  correspondingly  displaced  apparatus  state.15  Thus  we  can  write 
(3.2)  as: 

(3.11)  </rf+A  = 

which  is  a  superposition  of  states  ip^'  =  5(q— q0  ’|(r-q/0.  Each  of  these 
elements,  P  q'>  of  the  superposition  describes  a  state  in  which  the  sys¬ 
tem  has  the  definite  value  q  =  q',  and  in  which  the  apparatus  has  a  state 
that  is  displaced  from  its  original  state  by  the  amount  q't.  These  ele¬ 
ments  <Aq'  are  then  superposed  with  coefficients  <£(q')  to  form  the  total 
state  (3.11). 


I 


<£(q')8(q-q')  l(r-q't)  dq' 


IS 


See  discussion  of  relative  states,  p-  38. 
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Conversely,  if  we  transform  to  the  representation  where  the  apparatus 
is  definite,  we  write  (3.2)  as: 


(3.12) 


where 


and 


pf+A  =  J  (1/Nj/)  f r  (q)  8(r— r')  dr'  , 

fr(q)  =  *V<£(q)  ?/(r  — qt) 


(1/Nf')2  =  J 


4>  (q)  ^>(q)  r/*(r -qt)  Tj(r-qt)  dq  . 


✓ 

Then  the  £ c  (q)  are  the  relative  system  state  functions  for  the  apparatus 
states  S(r— r')  of  definite  value  r  =  t. 

* 

We  notice  that  these  relative  system  states,  £r(q),  are  nearly  eigen¬ 
states  for  the  values  q  =  r'/t,  if  the  degree  of  correlation  between  q  and 
r  is  sufficiently  high,  i.e.,  if  t  is  sufficiently  large,  or  r/(r)  sufficiently 
sharp  (near  8(r)  )  then  fr(q)  is  nearly  8(q— r'/t). 

This  property,  that  the  relative  system  states  become  approximate 
eigenstates  of  the  measurement,  is  in  fact  common  to  all  measurements. 

If  we  adopt  as  a  measure  of  the  nearness  of  a  state  p  to  being  an  eigen¬ 
function  of  an  operator  A  the  information  1^0/0,  which  is  reasonable 
because  I^OA)  measures  the  sharpness  of  the  distribution  of  A  for  p, 
then  it  is  a  consequence  of  our  definition  of  a  measurement  that  the  rela¬ 
tive  system  states  tend  to  become  eigenstates  as  the  interaction  proceeds. 
Since  Exptlg]  =  Iq  +  {Q,Ri,  and  Iq  remains  constant  while  |Q,R} 
tends  toward  its  maximum  (or  infinity)  during  the  interaction,  we  have  that 
Exp[lg]  tends  to  a  maximum  (or  infinity).  But  Iq  is  just  the  information 
in  the  relative  system  states,  which  we  have  adopted  as  a  measure  of  the 
nearness  to  an  eigenstate.  Therefore,  at  least  in  expectation,  the  relative 
system  states  approach  eigenstates. 

We  have  seen  that  (3.12)  is  a  superposition  of  states  pt',  (or  each 
of  which  the  apparatus  has  recorded  a  definite  value  r',  and  the  system 
is  left  in  approximately  the  eigenstate  of  the  measurement  corresponding 
to  q  =  r'/t.  The  discontinuous  “jump”  into  an  eigenstate  is  thus  only  a 
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relative  proposition,  dependent  upon  our  decomposition  of  the  total  wave 
function  into  the  superposition,  and  relative  to  a  particularly  chosen  appa¬ 
ratus  value.  So  far  as  the  complete  theory  is  concerned  all  elements  of 
the  superposition  exist  simultaneously,  and  the  entire  process  is  quite 
continuous. 

We  have  here  only  a  special  case  of  the  following  general  principle 
which  will  hold  for  any  situation  which  is  treated  entirely  wave  mechani¬ 
cally: 


PRINCIPLE.  For  any  situation  in  which  the  existence  of  a  property  Rj 
for  a  subsystem  Sj  of  a  composite  system  S  will  imply  the  later  property 
Q-  for  S,  then  it  is  also  true  that  an  initial  state  for  S.  of  the  form 

K  _  C  1 

if/  1  =  ^  ai^[R.]  which  is  a  superposition  of  states  with  the  properties 

^  C  ^  P 

Rj,  will  result  in  a  later  state  for  S  of  the  form  xjj  -  ^  ai^[Q.]  ’ 


which  is  also  a  superposition,  of  states  with  the  property  Q-.  That  is, 
for  any  arrangement  of  an  interaction  between  two  systems  S.  and  S9, 
which  has  the  property  that  each  initial  state  <£■ 1  ^  2  will  result  in  a 

r  o  * 

final  situation  with  total  state  ift •  ,  an  initial  state  of  S.  of  the 

form  ^  ai^i  will  lead,  after  interaction,  to  the  superposition 

ai^f1+^2  i°r  the  whole  system, 
i 

This  follows  immediately  from  the  superposition  principle  for  solutions 
of  a  linear  wave  equation.  It  therefore  holds  for  any  system  of  quantum 
mechanics  for  which  the  superposition  principle  holds,  both  particle  and 
field  theories,  relativistic  or  not,  and  is  applicable  to  all  physical  sys¬ 
tems,  regardless  of  size. 

This  principle  has  the  far  reaching  implication  that  for  any  possible 
measurement,  for  which  the  initial  system  state  is  not  an  eigenstate,  the 
resulting  state  of  the  composite  system  leads  to  no  definite  system  state 
nor  any  definite  apparatus  state.  The  system  will  not  be  put  into  one  or 
another  of  its  eigenstates  with  the  apparatus  indicating  the  corresponding 
value,  and  nothing  resembling  Process  1  can  take  place. 
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To  see  that  this  is  indeed  the  case,  suppose  that  we  have  a  measur¬ 
ing  arrangement  with  the  following  properties.  The  initial  apparatus  state 
is  If  the  system  is  initially  in  an  eigenstate  of  the  measurement, 

n  Q  A 

,  then  after  a  specified  time  of  interaction  the  total  state  0!?  ifjQ  will 
be  transformed  into  a  state  (f>f  i/f-\  i.e.,  the  system  eigenstate  shall  not 
be  disturbed,  and  the  apparatus  state  is  changed  to  i/f  jS  which  is  differ¬ 
ent  for  each  <£?.  (t//^  may  for  example  be  a  state  describing  the  appara- 

C 

tus  as  indicating,  by  the  position  of  a  meter  needle,  the  eigenvalue  of  <£j.) 
However,  if  the  initial  system  state  is  not  an  eigenstate  but  a  superposi¬ 
tion  then  the  final  composite  system  state  is  also  a  superposi- 

i 

tion,  This  follows  from  the  superposition  principle  since 

i  S  A 

all  we  need  do  is  superpose  our  solutions  for  the  eigenstates,  -► 

to  arrive  at  the  solution,  ■*  ^aj  f°r 

i  i 

general  case.  Thus  in  general  after  a  measurement  has  been  performed 
there  will  be  no  definite  system  state  nor  any  definite  apparatus  state, 
even  though  there  is  a  correlation.  It  seems  as  though  nothing  can  ever 
be  settled  by  such  a  measurement.  Furthermore  this  result  is  independent 
of  the  size  of  the  apparatus,  and  remains  true  for  apparatus  of  quite  mac¬ 
roscopic  dimensions. 

Suppose,  for  example,  that  we  coupled  a  spin  measuring  device  to  a 
cannonball,  so  that  if  the  spin  is  up  the  cannonball  will  be  shifted  one 
foot  to  the  left,  while  if  the  spin  is  down  it  will  be  shifted  an  equal  dis¬ 
tance  to  the  right.  If  we  now  perform  a  measurement  with  this  arrangement 
upon  a  particle  whose  spin  is  a  superposition  of  up  and  down,  then  the 
resulting  total  state  will  also  be  a  superposition  of  two  states,  one  in 
which  the  cannonball  is  to  the  left,  and  one  in  which  it  is  to  the  right. 
There  is  no  definite  position  for  our  macroscopic  cannonball! 

This  behavior  seems  to  be  quite  at  variance  with  our  observations, 
since  macroscopic  objects  always  appear  to  us  to  have  definite  positions. 
Can  we  reconcile  this  prediction  of  the  purely  wave  mechanical  theory 
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with  experience,  or  must  we  abandon  it  as  untenable?  In  order  to  answer 
this  question  we  must  consider  the  problem  of  observation  itself  within 
the  framework  of  the  theory. 


IV.  OBSERVATION 


We  shall  now  give  an  abstract  treatment  of  the  problem  of  observation. 
In  keeping  with  the  spirit  of  our  investigation  of  the  consequences  of  pure 
wave  mechanics  we  have  no  alternative  but  to  introduce  observers,  con¬ 
sidered  as  purely  physical  systems,  into  the  theory. 

We  saw  in  the  last  chapter  that  in  general  a  measurement  (coupling  of 
system  and  apparatus)  had  the  outcome  that  neither  the  system  nor  the 
apparatus  had  any  definite  state  after  the  interaction  —  a  result  seemingly 
at  variance  with  our  experience.  However,  we  do  not  do  justice  to  the 
theory  of  pure  wave  mechanics  until  we  have  investigated  what  the  theory 
itself  says  about  the  appearance  of  phenomena  to  observers,  rather  than 
hastily  concluding  that  the  theory  must  be  incorrect  because  the  actual 
states  of  systems  as  given  by  the  theory  seem  to  contradict  our  observa¬ 
tions. 

We  shall  see  that  the  introduction  of  observers  can  be  accomplished 
in  a  reasonable  manner,  and  that  the  theory  then  predicts  that  the  appear¬ 
ance  of  phenomena,  as  the  subjective  experience  of  these  observers,  is 
precisely  in  accordance  with  the  predictions  of  the  usual  probabilistic 
interpretation  of  quantum  mechanics. 

§1.  Formulation  of  the  problem 

We  are  faced  with  the  task  of  making  deductions  about  the  appearance 
of  phenomena  on  a  subjective  level,  to  observers  which  are  considered  as 
purely  physical  systems  and  are  treated  within  the  theory.  In  order  to 
accomplish  this  it  is  necessary  to  identify  some  objective  properties  of 
such  an  observer  (states)  with  subjective  knowledge  (i.e.,  perceptions). 
Thus,  in  order  to  say  that  an  observer  O  has  observed  the  event  a,  it 
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is  necessary  that  the  state  of  0  has  become  changed  from  its  former 
state  to  a  new  state  which  is  dependent  upon  a . 

It  will  suffice  for  our  purposes  to  consider  our  observers  to  possess 
memories  (i.e.,  parts  of  a  relatively  permanent  nature  whose  states  are  in 
correspondence  with  the  past  experience  of  the  observer).  In  order  to 
make  deductions  about  the  subjective  experience  of  an  observer  it  is  suf¬ 
ficient  to  examine  the  contents  of  the  memory. 

As  models  for  observers  we  can,  if  we  wish,  consider  automatically 
functioning  machines,  possessing  sensory  apparata  and  coupled  to  re¬ 
cording  devices  capable  of  registering  past  sensory  data  and  machine 
configurations.  We  can  further  suppose  that  the  machine  is  so  constructed 
that  its  present  actions  shall  be  determined  not  only  by  its  present  sen¬ 
sory  data,  but  by  the  contents  of  its  memory  as  well.  Such  a  machine  will 
then  be  capable  of  performing  a  sequence  of  observations  (measurements), 
and  furthermore  of  deciding  upon  its  future  experiments  on  the  basis  of 
past  results.  We  note  that  if  we  consider  that  current  sensory  data,  as 
well  as  machine  configuration,  is  immediately  recorded  in  the  memory, 
then  the  actions  of  the  machine  at  a  given  instant  can  be  regarded  as  a 
function  of  the  memory  contents  only,  and  all  relevant  experience  of  the 
machine  is  contained  in  the  memory. 

For  such  machines  we  are  justified  in  using  such  phrases  as  “the 
machine  has  perceived  A”  or  “the  machine  is  aware  of  A”  if  the  occur¬ 
rence  of  A  is  represented  in  the  memory,  since  the  future  behavior  of 
the  machine  will  be  based  upon  the  occurrence  of  A.  In  fact,  all  of  the 
customary  language  of  subjective  experience  is  quite  applicable  to  such 
machines,  and  forms  the  most  natural  and  useful  mode  of  expression  when 
dealing  with  their  behavior,  as  is  well  known  to  individuals  who  work 
with  complex  automata. 

When  dealing  quantum  mechanically  with  a  system  representing  an  ob¬ 
server  we  shall  ascribe  a  state  function,  4^ >  t°  it.  When  the  State  ^ 
describes  an  observer  whose  memory  contains  representations  of  the 
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events  A,B,...,C  we  shall  denote  this  fact  by  appending  the  memory  se¬ 
quence  in  brackets  as  a  subscript,  writing: 

,0 

Ha.b . c]  • 

The  symbols  A,B,...,C,  which  we  shall  assume  to  be  ordered  time  wise, 
shall  therefore  stand  for  memory  configurations  which  are  in  correspond¬ 
ence  with  the  past  experience  of  the  observer.  These  configurations  can 
be  thought  of  as  punches  in  a  paper  tape,  impressions  on  a  magnetic  reel, 
configurations  of  a  relay  switching  circuit,  or  even  configurations  of  brain 
cells.  We  only  require  that  they  be  capable  of  the  interpretation  “The 
observer  has  experienced  the  succession  of  events  A,B,...,C.”  (We  shall 
sometimes  write  dots  in  a  memory  sequence,  [...  A,B,...,C],  to  indicate 
the  possible  presence  of  previous  memories  which  are  irrelevant  to  the 
case  being  considered.) 

Our  problem  is,  then,  to  treat  the  interaction  of  such  observer-systems 
with  other  physical  systems  (observations),  within  the  framework  of  wave 
mechanics,  and  to  deduce  the  resulting  memory  configurations,  which  we 
can  then  interpret  as  the  subjective  experiences  of  the  observers. 

We  begin  by  defining  what  shall  constitute  a  “good”  observation.  A 
good  observation  of  a  quantity  A,  with  eigenfunctions  for  a  system 

S,  by  an  observer  whose  initial  state  is  if/®  j,  shall  consist  of  an  inter¬ 
action  which,  in  a  specified  period  of  time,  transforms  each  (total)  state 


where  ctj  characterizes  the  state  <f> j.  (It  might  stand  for  a  recording  of 
the  eigenvalue,  for  example.)  That  is,  our  requirement  is  that  the  system 
state,  if  it  is  an  eigenstate,  shall  be  unchanged,  and  that  the  observer 
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state  shall  change  so  as  to  describe  an  observer  that  is  “aware”  of  which 
eigenfunction  it  is,  i.e.,  some  property  is  recorded  in  the  memory  of  the 
observer  which  characterizes  <£j,  such  as  the  eigenvalue.  The  require¬ 
ment  that  the  eigenstates  for  the  system  be  unchanged  is  necessary  if  the 
observation  is  to  be  significant  (repeatable),  and  the  requirement  that  the 
observer  state  change  in  a  manner  which  is  different  for  each  eigenfunc¬ 
tion  is  necessary  if  we  are  to  be  able  to  call  the  interaction  an  observa¬ 
tion  at  all. 


§2.  Deductions 

From  these  requirements  we  shall  first  deduce  the  result  of  an  obser¬ 
vation  upon  a  system  which  is  not  in  an  eigenstate  of  the  observation.  We 
know,  by  our  previous  remark  upon  what  constitutes  a  good  observation 
that  the  interaction  transforms  states  j  into  states  a  j  • 

Consequently  we  can  simply  superpose  these  solutions  of  the  wave  equa¬ 
tion  to  arrive  at  the  final  state  for  the  case  of  an  arbitrary  initial  system 
state.  Thus  if  the  initial  system  state  is  not  an  eigenstate,  but  a  general 
state  we  get  for  the  final  total  state: 


(2.1) 



This  remains  true  also  in  the  presence  of  further  systems  which  do 
not  interact  for  the  time  of  measurement.  Thus,  if  systems  S.  ,S,,...,S 
are  present  as  well  as  0,  with  original  states  ij/  1  ,if/  n,  and 

the  only  interaction  during  the  time  of  measurement  is  between  Sj  and 
0,  the  result  of  the  measurement  will  be  the  transformation  of  the  initial 
total  state: 

^s1+s2+...+sn+o  m,slms2  sai0 


=  ^  v 2—^  ^p  ] 


into  the  final  state: 


S1+S,,-i-...-i-Sn+0  S.  S9  S  n 

+■'  2  "  -2^ '*’■■■*  n*L^ 


(2.2) 
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where  a^=\d>-l  ,ifr  J  and  <f>-y  are  eigenfunctions  of  the  observation. 

Thus  we  arrive  at  the  general  rule  for  the  transformation  of  total  state 
functions  which  describe  systems  within  which  observation  processes 


occur: 


A 


Rule  1.  The  observation  of  a  quantity  A,  with  eigenfunctions  <j>x  ,  in 
a  system  Sj  by  the  observer  O,  transforms  the  total  state  according  to: 


*s>  A-.fX.j  -  2 

i- 


where  a 


If  we  next  consider  a  second  observation  to  be  made,  where  our  total 
state  is  now  a  superposition,  we  can  apply  Rule  1  separately  to  each  ele¬ 
ment  of  the  superposition,  since  each  element  separately  obeys  the  wave 
equation  and  behaves  independently  of  the  remaining  elements,  and  then 
superpose  the  results  to  obtain  the  final  solution.  We  formulate  this  as: 


Rule  2.  Rule  1  may  be  applied  separately  to  each  element  of  a  superposi¬ 
tion  of  total  system  states,  the  results  being  superposed  to  obtain  the 

S2 

final  total  state.  Thus,  a  determination  of  B,  with  eigenfunctions  rjj  , 
on  S2  by  the  observer  0  transforms  the  total  state 


S,  S 



into  the  state 


.  S*  S0  .  S3 


if  j  J 

where  bj  =  which  follows  from  the  application  of  Rule  1  to 


each  element  i/rS2...^rSrV9  and  then  superposing  the  results 

with  the  coefficients  a-. 
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These  two  rules,  which  follow  directly  from  the  superposition  princi¬ 
ple,  give  us  a  convenient  method  for  determining  final  total  states  for  any 
number  of  observation  processes  in  any  combinations.  We  must  now  seek 
the  interpretation  of  such  final  total  states. 

Let  us  consider  the  simple  case  of  a  single  observation  of  a  quantity 

C 

A,  with  eigenfunctions  <£j,  in  the  system  S  with  initial  state  </r  ,  by 
an  observer  0  whose  initial  state  is  ^  j.  The  final  result  is,  as  we 
have  seen,  the  superposition: 


(2.3) 



We  note  that  there  is  no  longer  any  independent  system  state  or  observer 
state,  although  the  two  have  become  correlated  in  a  one-one  manner.  How¬ 
ever,  in  each  element  of  the  superposition  (2.3),  a  y  the  object- 

system  state  is  a  particular  eigenstate  of  the  observer,  and  furthermore 
the  observer-system  state  describes  the  observer  as  definitely  perceiving 
that  particular  system  state.1  It  is  this  correlation  which  allows  one  to 
maintain  the  interpretation  that  a  measurement  has  been  performed. 

We  now  carry  the  discussion  a  step  further  and  allow  the  observer- 
system  to  repeat  the  observation.  Then  according  to  Rule  2  we  arrive  at 
the  total  state  after  the  second  observation: 


At  this  point  we  encounter  a  language  difficulty.  Whereas  before  the  observa¬ 
tion  we  had  a  single  observer  state  afterwards  there  were  a  number  of  different 
states  for  the  observer,  all  occurring  in  a  superposition.  Each  of  these  separate 
states  is  a  state  for  an  observer,  so  that  we  can  speak  of  the  different  observers 
described  by  the  different  states.  On  the  other  hand,  the  same  physical  system 
is  involved,  and  from  this  viewpoint  it  is  the  same  observer,  which  is  in  different 
states  for  different  elements  of  the  superposition  (i.e.,  has  had  different  experi¬ 
ences  in  the  separate  elements  of  the  superposition).  In  this  situation  we  shall 
use  the  singular  when  we  wish  to  emphasize  that  a  single  physical  system  is  in¬ 
volved,  and  the  plural  when  we  wish  to  emphasize  the  different  experiences  for 
the  separate  elements  of  the  superposition,  (e.g.,  “The  observer  performs  an  ob¬ 
servation  of  the  quantity  A,  after  which  each  of  the  observers  of  the  resulting 
superposition  has  perceived  an  eigenvalue.”) 
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(2.4) 



Again,  we  see  that  each  element  of  (2.4),  r  i,  describes  a  sys- 

iit. . •  ,cl  •  ,a  •  j 

tem  eigenstate,  but  this  time  also  describes  the  observer  as  having  ob- 
obtained  the  same  result  for  each  of  the  two  observations.  Thus  for  every 
separate  state  of  the  observer  in  the  final  superposition,  the  result  of  the 
observation  was  repeatable,  even  though  different  for  different  states. 

This  repeatability  is,  of  course,  a  consequence  of  the  fact  that  after  an 
observation  the  relative  system  state  for  a  particular  observer  state  is 
the  corresponding  eigenstate. 

Let  us  suppose  now  that  an  observer-system  O,  with  initial  state 
if/P  i,  measures  the  same  quantity  A  in  a  number  of  separate  identical 

f***J  Sj  §2  S 

systems  which  are  initially  in  the  same  state,  if/  =  if/  =  ...  =  if/ 

(where  the  <£j  are,  as  usual,  eigenfunctions  of  A).  The  initial 

i 

total  state  function  is  then 


(2.3) 


S1+S2+...+Sn+0  .,.S1./.S2  ,/.Sn  ./.O 

V'o 


=  V  n^p  ] 


We  shall  assume  that  the  measurements  are  performed  on  the  systems  in 
the  order  Sj ,S2,...,Sn.  Then  the  total  state  after  the  first  measurement 
will  be,  by  Rule  1, 


(2.4) 


S,  +S9+...+S_+0 
i'l  2 


s,  s„ 


i 

(where  a|  refers  to  the  first  system,  Sj)  . 


After  the  second  measurement  it  will  be,  by  Rule  2, 
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and  in  general,  after  r  measurements  have  taken  place  (r  $  n)  Rule  2 
gives  the  result: 


(2.6)  2  a  ,  , . o 


We  can  give  this  state,  ^r,  the  following  interpretation.  It  consists 
of  a  superposition  of  states: 


(2.7) 


S,  S 



each  of  which  describes  the  observer  with  a  definite  memory  sequence 

[...,aj  ,a?,...,a{-],  and  relative  to  whom  the  (observed  system  states  are 
1  J  K  S  S  S 

the  corresponding  eigenfunctions  ,  c£j 2 ,  the  remaining  sys¬ 
tems,  Sr+1,  •••Sjj,  being  unaltered. 

In  the  language  of  subjective  experience,  the  observer  which  is  de¬ 
scribed  by  a  typical  element,  j{,  of  the  superposition  has  perceived 

an  apparently  random  sequence  of  definite  results  for  the  observations.  It 
is  furthermore  true,  since  in  each  element  the  system  has  been  left  in  an 
eigenstate  of  the  measurement,  that  if  at  this  stage  a  redetermination  of 
an  earlier  system  observation  (Sg)  takes  place,  every  element  of  the  re¬ 
sulting  final  superposition  will  describe  the  observer  with  a  memory  con¬ 
figuration  of  the  form  [...,aj,...,q|,...,ajj,a?]  in  which  the  earlier  memory 
coincides  with  the  later  —  i.e.,  the  memory  states  are  correlated.  It  will 
thus  appear  to  the  observer  which  is  described  by  a  typical  element  of  the 
superposition  that  each  initial  observation  on  a  system  caused  the  system 
to  “jump”  into  an  eigenstate  in  a  random  fashion  and  thereafter  remain 
there  for  subsequent  measurements  on  the  same  system.  Therefore,  quali¬ 
tatively,  at  least,  the  probabilistic  assertions  of  Process  1  appear  to  be 
valid  to  the  observer  described  by  a  typical  element  of  the  final  super¬ 
position. 

In  order  to  establish  quantitative  results,  we  must  put  some  sort  of 
measure  (weighting)  on  the  elements  of  a  final  superposition.  This  is 


ST* 
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necessary  to  be  able  to  make  assertions  which  will  hold  for  almost  all  of 
the  observers  described  by  elements  of  a  superposition.  In  order  to  make 
quantitative  statements  about  the  relative  frequencies  of  the  different 
possible  results  of  observation  which  are  recorded  in  the  memory  of  a 
typical  observer  we  must  have  a  method  of  selecting  a  typical  observer. 

Let  us  therefore  consider  the  search  for  a  general  scheme  for  assign¬ 
ing  a  measure  to  the  elements  of  a  superposition  of  orthogonal  states 
ai<^i‘  re<lu*te  then  a  positive  function  1  of  the  complex  coeffi¬ 
cients  of  the  elements  of  the  superposition,  so  that  DTI(aj)  shall  be  the 
measure  assigned  to  the  element  <f. ij.  In  order  that  this  general  scheme 
shall  be  unambiguous  we  must  first  require  that  the  states  themselves 
always  be  normalized,  so  that  we  can  distinguish  the  coefficients  from 
the  states.  However,  we  can  still  only  determine  the  coefficients,  in  dis¬ 
tinction  to  the  states,  up  to  an  arbitrary  phase  factor,  and  hence  the  func¬ 
tion  3H  must  be  a  function  of  the  amplitudes  of  the  coefficients  alone, 
(i.e.,  ^(aj)  =  l(\/ajfaj)  ),  in  order  to  avoid  ambiguities. 

If  we  now  impose  the  additivity  requirement  that  if  we  regard  a  subset 
n 

of  the  superposition,  say  ^  a as  a  single  element 

i=l 

n 

(2.8)  a<f>'  =  2  a^  , 

i  =  l 

then  the  measure  assigned  to  <f>'  shall  be  the  sum  of  the  measures 
assigned  to  the  0  from  1  to  n) : 

(2.9)  1(a)  =  ^  JK(«i)  , 

i 

then  we  have  already  restricted  the  choice  of  1  to  the  square  amplitude 
alone.  (l(aj)  =  ajfaj),  apart  from  a  multiplicative  constant.) 

To  see  this  we  note  that  the  normality  of  <f>'  requires  that  |a|  = 

.  From  our  remarks  upon  the  dependence  of  1  upon  the  ampli¬ 
tude  alone,  we  replace  the  by  their  amplitudes  =  |aj|. 
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(2.9)  then  requires  that 

(2.10)  jn(fl) = 3n  (72aiai) = = 2  ^ = 2  • 

Defining  a  new  function  g(x): 

(2.11)  g(x)  =  5H(v^) , 

we  see  that  (2.10)  requires  that 

(2.12)  £<*?>■ 

so  that  g  is  restricted  to  be  linear  and  necessarily  has  the  form: 

(2.13)  g(x)  =  cx  (c  constant)  . 

Therefore  g(x2)  =  cx2  =  JH^/x2  =  )H(x)  and  we  have  deduced  that  is  re¬ 

stricted  to  the  form 

(2.14)  JlKap  =  3H(/T|)  =  cMf  =  caJ'aj  , 

and  we  have  shown  that  the  only  choice  of  measure  consistent  with  our 
additivity  requirement  is  the  square  amplitude  measure,  apart  from  an  arbi¬ 
trary  multiplicative  constant  which  may  be  fixed,  if  desired,  by  normaliza¬ 
tion  requirements.  (The  requirement  that  the  total  measure  be  unity  implies 
that  this  constant  is  1.) 

The  situation  here  is  fully  analogous  to  that  of  classical  statistical 
mechanics,  where  one  puts  a  measure  on  trajectories  of  systems  in  the 
phase  space  by  placing  a  measure  on  the  phase  space  itself,  and  then 
making  assertions  which  hold  for  “almost  all”  trajectories  (such  as 
ergodicity,  quasi-ergodicity,  etc).2  This  notion  of  “almost  all”  depends 
here  also  upon  the  choice  of  measure,  which  is  in  this  case  taken  to  be 
Lebesgue  measure  on  the  phase  space.  One  could,  of  course,  contradict 


2 


See  Khinchin  [l6]. 
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the  statements  of  classical  statistical  mechanics  by  choosing  a  measure 
for  which  only  the  exceptional  trajectories  had  nonzero  measure.  Never¬ 
theless  the  choice  of  Lebesgue  measure  on  the  phase  space  can  be  justi¬ 
fied  by  the  fact  that  it  is  the  only  choice  for  which  the  “conservation  of 
probability”  holds,  (Liouville’s  theorem)  and  hence  the  only  choice  which 
makes  possible  any  reasonable  statistical  deductions  at  all. 

In  our  case,  we  wish  to  make  statements  about  “trajectories”  of  ob¬ 
servers.  However,  for  us  a  trajectory  is  constantly  branching  (transform¬ 
ing  from  state  to  superposition)  with  each  successive  measurement.  To 
have  a  requirement  analogous  to  the  “conservation  of  probability”  in  the 
classical  case,  we  demand  that  the  measure  assigned  to  a  trajectory  at 
one  time  shall  equal  the  sum  of  the  measures  of  its  separate  branches  at 
a  later  time.  This  is  precisely  the  additivity  requirement  which  we  im¬ 
posed  and  which  leads  uniquely  to  the  choice  of  square-amplitude  measure. 
Our  procedure  is  therefore  quite  as  justified  as  that  of  classical  statisti¬ 
cal  mechanics. 


Having  deduced  that  there  is  a  unique  measure  which  will  satisfy  our 
requirements,  the  square-amplitude  measure,  we  continue  our  deduction. 
This  measure  then  assigns  to  the  i,j,...,k^  element  of  the  superposition 


(2.6), 

(2.15) 


. ay. 


the  measure  (weight) 


(2.16) 


Mij...k  *  (aiaj...ak)*(aiaj...ak)  , 


so  that  the  observer  state  with  memory  configuration  ,aj  ,...,ak]  is 

assigned  the  measure  a*caiaJ!aj...a£aj4  =  M-  k>  We  see  immediately  that 
this  is  a  product  measure,  namely 


(2.17) 


Mij...k  =  MiMj-Mk 


where 


Mg  =  agag  , 


74 


HUGH  EVERETT,  III 


so  that  the  measure  assigned  to  a  particular  memory  sequence 

,a? is  simply  the  product  of  the  measures  for  the  individual 
components  of  the  memory  sequence. 

We  notice  now  a  direct  correspondence  of  our  measure  structure  to  the 
probability  theory  of  random  sequences.  Namely,  if  we  were  to  regard  the 
My  k  as  probabilities  for  the  sequences  [...,a- ,a2 then  the  se¬ 
quences  are  equivalent  to  the  random  sequences  which  are  generated  by 
ascribing  to  each  term  the  independent  probabilities  Mg  =  ag  ag.  Now  the 
probability  theory  is  equivalent  to  measure  theory  mathematically,  so  that 
we  can  make  use  of  it,  while  keeping  in  mind  that  all  results  should  be 
translated  back  to  measure  theoretic  language. 

Thus,  in  particular,  if  we  consider  the  sequences  to  become  longer 
and  longer  (more  and  more  observations  performed)  each  memory  sequence 
of  the  final  superposition  will  satisfy  any  given  criterion  for  a  randomly 
generated  sequence,  generated  by  the  independent  probabilities  a-fa^,  ex¬ 
cept  for  a  set  of  total  measure  which  tends  toward  zero  as  the  number  of 
observations  becomes  unlimited.  Hence  all  averages  of  functions  over 
any  memory  sequence,  including  the  special  case  of  frequencies,  can  be 
computed  from  the  probabilities  a*a£,  except  for  a  set  of  memory  sequen¬ 
ces  of  measure  zero.  We  have  therefore  shown  that  the  statistical  asser¬ 
tions  of  Process  1  will  appear  to  be  valid  to  almost  all  observers  de¬ 
scribed  by  separate  elements  of  the  superposition  (2.6),  in  the  limit  as 
the  number  of  observations  goes  to  infinity. 

While  we  have  so  far  considered  only  sequences  of  observations  of 
the  same  quantity  upon  identical  systems,  the  result  is  equally  true  for 
arbitrary  sequences  of  observations.  For  example,  the  sequence  of  obser¬ 
vations  of  the  quantities  A1,  A2,...,  An,...  with  (generally  different) 
eigenfunction  sets  {<£[},  {<£?},...,  applied  successively  to  the 

*  J  **  s  s  s 

systems  Sj,S2,...,S  ,...,  with  (arbitrary)  initial  states  if,  ,if/  ,...,ip  n, 
...  transforms  the  total  initial  state: 


S.+...+S  +0 

if,  1 


0 

[...] 


(2.18) 
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by  rules  1  and  2,  into  the  final  state: 
S,  +S„ + . . .  +S„  4-0 


(2.i9)  r 


12 


12  n  i. 
J  L.**  ,flj  r  •  •  •  •  J 


it 

where  the  memory  sequence  element  characterizes  the  £ln  eigen¬ 
function,  </>£  of  the  operator  Ar.  Again  the  square  amplitude  measure 
for  each  element  of  the  superposition  (2.19)  reduces  to  the  product  mea- 
sure  of  the  individual  memory  element  measures,  |(0^, r)|  for  the 
memory  sequence  element  ajj.  Therefore,  the  memory  sequence  of  a  typi¬ 
cal  element  of  (2.19)  has  all  the  characteristics  of  a  random  sequence, 

S  2 

with  individual,  independent  (and  now  different),  probabilities  |(<£jj,^  r)| 
for  the  r  memory  state. 

Finally,  we  can  generalize  to  the  case  where  several  observations  are 
allowed  to  be  performed  upon  the  same  system.  For  example,  if  we  permit 
the  observation  of  a  new  quantity  B,  (eigenfunctions  r}m,  memory  char¬ 
acterization  /3j)  upon  the  system  Sf  for  which  Ar  has  already  been 
observed,  then  the  state  (2.19): 


(2.201  *'=  J 

i,£,...,k 


1  r  n  1 

^  t  ^  [  <  ■  •  |  p  •  9  »  f(Z  np  •  •  »  fC 


is  transformed  by  Rule  2  into  the  state: 


(2.2D  r-  2  (0i^Sl)-.(^^Sr)...(<^{;^Sn)(4^) 

i,...,£ . k,m 

^  ^  flj, . 
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The  relative  system  states  for  S  have  been  changed  from  the  eigenstates 
of  Ar, !<£•!,  to  the  eigenstates  of  Br,  We  notice  further  that,  with 

respect  to  our  measure  on  the  superposition,  the  memory  sequences  still 
have  the  character  of  random  sequences,  but  of  random  sequences  for 
which  the  individual  terms  are  no  longer  independent.  The  memory  states 
/3jn  now  depend  upon  the  memory  states  ajj  which  represent  the  result  of 
the  previous  measurement  upon  the  same  system,  Sr-  The  joint  (normal¬ 
ized)  measure  for  this  pair  of  memory  states,  conditioned  by  fixed  values 
for  remaining  memory  states  is: 


(2.22) 


mc; . 4:^iX) . 

'  5>? . 

£,m 

1(0- .■ASl)--(0£,^Sr)--(^>k.^Sn)(77m.0PI 

£,m 


l(0£.^Sr)l  lO?m’0£)l 


The  joint  measure  (2.15)  is,  first  of  all,  independent  of  the  memory 
states  for  the  remaining  systems  (Sj...Sn  excluding  Sf).  Second,  the 
dependence  of  ftTm  on  is  equivalent,  measure  theoretically,  to  that 
given  by  the  stochastic  process 3  which  converts  the  states  </>£  into  the 
states  with  transition  probabilities: 

(2.23)  T£m  =  Prob.  (0*  -  rfj  =  |0&,*p|2  . 


3 


Cf.  Chapter  H,  §6. 
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If  we  were  to  allow  yet  another  quantity  C  to  be  measured  in  Sr,  the 
new  memory  states  aji,  corresponding  to  the  eigenfunctions  of  C  would 
have  a  similar  dependence  upon  the  previous  states  fiTm,  but  no  direct 
dependence  on  the  still  earlier  states  c^.  This  dependence  upon  only  the 
previous  result  of  observation  is  a  consequence  of  the  fact  that  the  rela¬ 
tive  system  states  are  completely  determined  by  the  last  observation. 

We  can  therefore  summarize  the  situation  for  an  arbitrary  sequence  of 
observations,  upon  the  same  or  different  systems  in  any  order,  and  for 
which  the  number  of  observations  of  each  quantity  in  each  system  is  very 
large,  with  the  following  result: 

Except  for  a  set  of  memory  sequences  of  measure  nearly  zero,  the 
averages  of  any  functions  over  a  memory  sequence  can  be  calculated 
approximately  by  the  use  of  the  independent  probabilities  given  by  Process 
1  for  each  initial  observation,  on  a  system,  and  by  the  use  of  the  transi¬ 
tion  probabilities  (2.23)  for  succeeding  observations  upon  the  same  system. 
In  the  limit,  as  the  number  of  all  types  of  observations  goes  to  infinity  the 
calculation  is  exact,  and  the  exceptional  set  has  measure  zero. 

This  prescription  for  the  calculation  of  averages  over  memory  sequen¬ 
ces  by  probabilities  assigned  to  individual  elements  is  precisely  that  of 
the  orthodox  theory  (Process  1).  Therefore  all  predictions  of  the  usual 
theory  will  appear  to  be  valid  to  the  observer  in  almost  all  observer  states, 
since  these  predictions  hold  for  almost  all  memory  sequences. 

In  particular,  the  uncertainty  principle  is  never  violated,  since,  as 
above,  the  latest  measurement  upon  a  system  supplies  all  possible  infor¬ 
mation  about  the  relative  system  state,  so  that  there  is  no  direct  correla¬ 
tion  between  any  earlier  results  of  observation  on  the  system,  and  the 
succeeding  observation.  Any  observation  of  a  quantity  B,  between  two 
successive  observations  of  quantity  A  (all  on  the  same  system)  will 
destroy  the  one-one  correspondence  between  the  earlier  and  later  memory 
states  for  the  result  of  A.  Thus  for  alternating  observations  of  different 
quantities  there  are  fundamental  limitations  upon  the  correlations  between 
memory  states  for  the  same  observed  quantity,  these  limitations  expressing 
the  content  of  the  uncertainty  principle. 
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In  conclusion,  we  have  described  in  this  section  processes  involving 
an  idealized  observer,  processes  which  are  entirely  deterministic  and  con¬ 
tinuous  from  the  over-all  viewpoint  (the  total  state  function  is  presumed 
to  satisfy  a  wave  equation  at  all  times)  but  whose  result  is  a  superposi¬ 
tion,  each  element  of  which  describes  the  observer  with  a  different  memory 
state.  We  have  seen  that  in  almost  all  of  these  observer  states  it  appears 
to  the  observer  that  the  probabilistic  aspects  of  the  usual  form  of  quantum 
theory  are  valid.  We  have  thus  seen  how  pure  wave  mechanics,  without 
any  initial  probability  assertions,  can  lead  to  these  notions  on  a  subjec¬ 
tive  level,  as  appearances  to  observers. 

§3.  Several  observers 

We  shall  now  consider  the  consequences  of  our  scheme  when  several 
observers  are  allowed  to  interact  with  the  same  systems,  as  well  as  with 
one  another  (communication).  In  the  following  discussion  observers  shall 
be  denoted  by  Oj.Oj,...,  other  systems  by  SpSj,...,  and  observables 
by  operators  A,  B,  C,  with  eigenfunctions  |0j},  respectively. 

The  symbols  ctj,  jS-,  yk,  occurring  in  memory  sequences  shall  refer  to 
characteristics  of  the  states  4>i<  £]<>  respectively.  (t/>^  ^  j  is  inter¬ 

preted  as  describing  an  observer,  Oj,  who  has  just  observed  the  eigen¬ 
value  corresponding  to  <£j,  i.e.,  who  is  “aware”  that  the  system  is  in 
state  0j.) 

We  shall  also  wish  to  allow  communication  among  the  observers,  which 
we  view  as  an  interaction  by  means  of  which  the  memory  sequences  of 
different  observers  become  correlated.  (For  example,  the  transfer  of  im¬ 
pulses  from  the  magnetic  tape  memory  of  one  mechanical  observer  to  that 
of  another  constitutes  such  a  transfer  of  information.)4  We  shall  regard 
these  processes  as  observations  made  by  one  observer  on  another  and 
shall  use  the  notation  that 


We  assume  that  such  transfers  merely  duplicate,  but  do  not  destroy,  the  origi¬ 
nal  information. 
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represents  a  state  function  describing  an  observer  Oj  who  has  obtained 
the  information  a ^  from  another  observer,  0^.  Thus  the  obtaining  of  in 
formation  about  A  from  Oj  by  02  will  transform  the  state 


into  the  state 

(3.1) 


0 



Oi  On  O 


]  * 


Rules  1  and  2  are,  of  course,  equally  applicable  to  these  interactions.  We 
shall  now  illustrate  the  possibilities  for  several  observers,  by  considering 
several  cases. 


Case  1:  We  allow  two  observers  to  separately  observe  the  same  quantity 
in  a  system,  and  then  compare  results. 

We  suppose  that  first  observer  Oj  observes  the  quantity  A  for  the 
system  S.  Then  by  Rule  1  the  original  state 


S+0.  +0«  c  9i  0« 

<A  1  2  =  .] 

is  transformed  into  the  state 

(3.2)  • 

i 


We  now  suppose  that  02  observes  A,  and  by  Rule  2  the  state  be¬ 
comes: 

(3.3) 

i 


r- 2 
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We  now  allow  02  to  “consult”  0^,  which  leads  in  the  same  fashion 
from  (3.1)  and  Rule  2  to  the  final  state 


(3.4) 



Thus,  for  every  element  of  the  superposition  the  information  obtained 
from  Oj  agrees  with  that  obtained  directly  from  the  system.  This  means 
that  observers  who  have  separately  observed  the  same  quantity  will  always 
agree  with  each  other. 

Furthermore,  it  is  obvious  at  this  point  that  the  same  result,  (4.4),  is 
obtained  if  CL  first  consults  O. ,  then  performs  the  direct  observation, 

1  1  Oj 

except  that  the  memory  sequence  for  09  is  reversed  ([..., a-  1,ai]  instead 
of  [....aj.aj  l]).  There  is  still  perfect  agreement  in  every  element  of  the 
superposition.  Therefore,  information  obtained  from  another  observer  is 
always  reliable,  since  subsequent  direct  observation  will  always  verify  it. 
We  thus  see  the  central  role  played  by  correlations  in  wave  functions  for 
the  preservation  of  consistency  in  situations  where  several  observers  are 
allowed  to  consult  one  another.  It  is  the  transitivity  of  correlation  in 
these  cases  (that  if  Sj  is  correlated  to  S2,  and  S2  to  S3,  then  so  is 
Sj  to  S2)  which  is  responsible  for  this  consistency. 


Case  2:  We  allow  two  observers  to  measure  separately  two  different,  non¬ 
commuting  quantities  in  the  same  system. 


Assume  that  first  Oj  observes  A  for  the  system,  so  that,  as  before, 
the  initial  state  <^Si {/°l  i//°2  is  transformed  to: 

(3.5)  2(^i^?)^i^i[!..,«i]^.]  • 

i 

Next  let  02  determine  fi  for  the  system,  where  l*7j!  are  the  eigen¬ 
functions  of  /3.  Then  by  application  of  Rule  2  the  result  is 
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(3.6)  2(<^i’^S)(r?j’<?!,i)(7?j^...,ai]^j[?..,/3-] 

i.j  3 


02  is  now  perfectly  correlated  with  the  system,  since  a  redetermination 
by  him  will  lead  to  agreeing  results.  This  is  no  longer  the  case  for  Oj , 
however,  since  a  redetermination  of  A  by  him  will  result  in  (by  Rule  2) 


(3.7)  r- 2 


i,j.k 


•’ai’“k 


Hence  the  second  measurement  of  Oj  does  not  in  all  cases  agree 
with  the  first,  and  has  been  upset  by  the  intervention  of  02. 

We  can  deduce  the  statistical  relation  between  Oj’s  first  and  second 
results  (dj  and  ak)  by  our  previous  method  of  assigning  a  measure  to 
the  elements  of  the  superposition  (3.7).  The- measure  assigned  to  the 

iL 

(i,j,k)  element  is  then: 


(3.8)  Mijk  =  |(<£i,^S)O?j,0i)(<£k,7/j)|2  • 


This  measure  is  equivalent,  in  this  case,  to  the  probabilities  assigned  by 
the  orthodox  theory  (Process  1),  where  02’s  observation  is  regarded  as 
having  converted  each  state  into  a  non-interfering  mixture  of  states 
t) j,  weighted  with  probabilities  upon  which  Oj  makes  his 

second  observation. 

Note,  however,  that  this  equivalence  with  the  statistical  results  ob¬ 
tained  by  considering  that  02’s  observation  changed  the  system  state 
into  a  mixture,  holds  true  only  so  long  as  Oj ’s  second  observation  is 
restricted  to  the  system.  If  he  were  to  attempt  to  simultaneously  deter¬ 
mine  a  property  of  the  system  as  well  as  of  02,  interference  effects 
might  become  important.  The  description  of  the  states  relative  to  Oj, 
after  02’s  observation,  as  non-interfering  mixtures  is  therefore  incom¬ 
plete. 
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Case  3:  We  suppose  that  two  systems  Sj  and  S2  ate  correlated  but  no 

longer  interacting,  and  that  Oj  measures  property  A  in  S1 ,  and  02 
property  /S  in  S2< 

We  wish  to  see  whether  02’s  intervention  with  S2  can  in  any  way 
affect  Oj’s  results  in  Sj,  so  that  perhaps  signals  might  be  sent  by 
these  means.  We  shall  assume  that  the  initial  state  for  the  system  pair  is 


s.+s,  s.  s, 

(3.9)  01  • 

i 

We  now  allow  Oj  to  observe  A  in  Sj ,  so  that  after  this  observa¬ 
tion  the  total  state  becomes: 


(3.10) 


r 


S1+s2+01+02 


s,  s,  o, 

2ai^i  ^i  H.. 


Oj  can  of  course  continue  to  repeat  the  determination,  obtaining  the 
same  result  each  time. 

We  now  suppose  that  02  determines  /3  in  S2,  which  results  in 


(3.1D 

i.j 

However,  in  this  case,  as  distinct  from  Case  2,  we  see  that  the  inter¬ 
vention  of  02  in  no  way  affects  Oj’s  determinations,  since  Oj  is 
still  perfectly  correlated  to  the  states  cf>$1  of  St,  and  any  further  obser¬ 
vations  by  Oj  will  lead  to  the  same  results  as  the  earlier  observations. 
Thus  each  memory  sequence  for  Oj  continues  without  change  due  to 
02's  observation,  and  such  a  scheme  could  not  be  used  to  send  any 
signals. 

Furthermore,  we  see  that  the  result  (3.11)  is  arrived  at  even  in  the 
case  that  02  should  make  his  determination  before  that  of  01 .  There¬ 
fore  any  expectations  for  the  outcome  of  01  ’s  first  observation  are  in  no 
way  affected  by  whether  or  not  02  performs  his  observation  before  that 
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of  Oj .  This  is  true  because  the  expectation  of  the  outcome  for  Oj  can 
be  computed  from  (4.10),  which  is  the  same  whether  or  not  02  performs 
his  measurement  before  or  after  . 

It  is  therefore  seen  that  one  observer’s  observation  upon  one  system 
of  a  correlated,  but  non-interacting  pair  of  systems,  has  no  effect  on  the 
remote  system,  in  the  sense  that  the  outcome  or  expected  outcome  of  any 
experiments  by  another  observer  on  the  remote  system  are  not  affected. 
Paradoxes  like  that  of  Einstein-Rosen-Podolsky5  which  are  concerned 
with  such  correlated,  non-interacting,  systems  are  thus  easily  understood 
in  the  present  scheme. 

Many  further  combinations  of  several  observers  and  systems  can  be 
easily  studied  in  the  present  framework,  and  all  questions  answered  by 
first  writing  down  the  final  state  for  the  situation  with  the  aid  of  the 
Rules  1  and  2,  and  then  noticing  the  relations  between  the  elements  of 
the  memory  sequences. 


5 


Einstein  [8], 


V.  SUPPLEMENTARY  TOPICS 


We  have  now  completed  the  abstract  treatment  of  measurement  and 
observation,  with  the  deduction  that  the  statistical  predictions  of  the 
usual  form  of  quantum  theory  (Process  1)  will  appear  to  be  valid  to  all 
observers.  We  have  therefore  succeeded  in  placing  our  theory  in  corre¬ 
spondence  with  experience,  at  least  insofar  as  the  ordinary  theory  cor¬ 
rectly  represents  experience. 

We  should  like  to  emphasize  that  this  deduction  was  carried  out  by 
using  only  the  principle  of  superposition,  and  the  postulate  that  an  obser¬ 
vation  has  the  property  that  if  the  observed  variable  has  a  definite  value 
in  the  object-system  then  it  will  remain  definite  and  the  observer  will  per¬ 
ceive  this  value.  This  treatment  is  therefore  valid  for  any  possible  quan¬ 
tum  interpretation  of  observation  processes,  i.e.,  any  way  in  which  one 
can  interpret  wave  functions  as  describing  observers,  as  well  as  for  any 
form  of  quantum  mechanics  for  which  the  superposition  principle  for  states 
is  maintained.  Our  abstract  discussion  of  observation  is  therefore  logi¬ 
cally  complete,  in  the  sense  that  our  results  for  the  subjective  experience 
of  observers  are  correct,  if  there  are  any  observers  at  all  describable  by 
wave  mechanics.1 

In  this  chapter  we  shall  consider  a  number  of  diverse  topics  from  the 
point  of  view  of  our  pure  wave  mechanics,  in  order  to  supplement  the  ab¬ 
stract  discussion  and  give  a  feeling  for  the  new  viewpoint.  Since  we  are 
now  mainly  interested  in  elucidating  the  reasonableness  of  the  theory,  we 
shall  often  restrict  ourselves  to  plausibility  arguments,  rather  than  de¬ 
tailed  proofs. 


They  are,  of  course,  vacuously  correct  otherwise. 
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§1.  Macroscopic  objects  and  classical  mechanics 

In  the  light  of  our  knowledge  about  the  atomic  constitution  of  matter, 
any  “object”  of  macroscopic  size  is  composed  of  an  enormous  number  of 
constituent  particles.  The  wave  function  for  such  an  object  is  then  in  a 
space  of  fantastically  high  dimension  (3N,  if  N  is  the  number  of  parti¬ 
cles).  Our  present  problem  is  to  understand  the  existence  of  macroscopic 
objects,  and  to  relate  their  ordinary  (classical)  behavior  in  the  three  di¬ 
mensional  world  to  the  underlying  wave  mechanics  in  the  higher  dimension¬ 
al  space. 

Let  us  begin  by  considering  a  relatively  simple  case.  Suppose  that 
we  place  in  a  box  an  electron  and  a  proton,  each  in  a  definite  momentum 
state,  so  that  the  position  amplitude  density  of  each  is  uniform  over  the 
whole  box.  After  a  time  we  would  expect  a  hydrogen  atom  in  the  ground 
state  to  form,  with  ensuing  radiation.  We  notice,  however,  that  the  posi¬ 
tion  amplitude  density  of  each  particle  is  still  uniform  over  the  whole  box. 
Nevertheless  the  amplitude  distributions  are  now  no  longer  independent, 
but  correlated.  In  particular,  the  conditional  amplitude  density  for  the 
electron,  conditioned  by  any  definite  proton  (or  centroid)  position,  is  not 
uniform,  but  is  given  by  the  familiar  ground  state  wave  function  for  the 
hydrogen  atom.  What  we  mean  by  the  statement,  “a  hydrogen  atom  has 
formed  in  the  box,”  is  just  that  this  correlation  has  taken  place  —  a  corre¬ 
lation  which  insures  that  the  relative  configuration  for  the  electron,  for  a 
definite  proton  position,  conforms  to  the  customary  ground  state  configura¬ 
tion. 

The  wave  function  for  the  hydrogen  atom  can  be  represented  as  a 
product  of  a  centroid  wave  function  and  a  wave  function  over  relative 
coordinates,  where  the  centroid  wave  function  obeys  the  wave  equation 
for  a  particle  with  mass  equal  to  the  total  mass  of  the  proton-electron  sys¬ 
tem.  Therefore,  if  we  now  open  our  box,  the  centroid  wave  function  will 
spread  with  time  in  the  usual  manner  of  wave  packets,  to  eventually  occu¬ 
py  a  vast  region  of  space.  The  relative  configuration  (described  by  the 
relative  coordinate  state  function)  has,  however,  a  permanent  nature,  since 
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it  represents  a  bound  state,  and  it  is  this  relative  configuration  which  we 
usually  think  of  as  the  object  called  the  hydrogen  atom.  Therefore,  no 
matter  how  indefinite  the  positions  of  the  individual  particles  become  in 
the  total  state  function  (due  to  the  spreading  of  the  centroid),  this  state 
can  be  regarded  as  giving  (through  the  centroid  wave  function)  an  ampli¬ 
tude  distribution  over  a  comparatively  definite  object,  the  tightly  bound 
electron-proton  system.  The  general  state,  then,  does  not  describe  any 
single  such  definite  object,  but  a  superposition  of  such  cases  with  the 
object  located  at  different  positions. 

In  a  similar  fashion  larger  and  more  complex  objects  can  be  built  up 
through  strong  correlations  which  bind  together  the  constituent  particles. 

It  is  still  true  that  the  general  state  function  for  such  a  system  may  lead 
to  marginal  position  densities  for  any  single  particle  (or  centroid)  which 
extend  over  large  regions  of  space.  Nevertheless  we  can  speak  of  the 
existence  of  a  relatively  definite  object,  since  the  specification  of  a 
single  position  for  a  particle,  or  the  centroid,  leads  to  the  case  where  the 
relative  position  densities  of  the  remaining  particles  are  distributed 
closely  about  the  specified  one,  in  a  manner  forming  the  comparatively 
definite  object  spoken  of. 

Suppose,  for  example,  we  begin  with  a  cannonball  located  at  the  origin, 
described  by  a  state  function: 

^(0,0,0)]  ’ 

where  the  subscript  indicates  that  the  total  state  function  ifr  describes  a 
system  of  particles  bound  together  so  as  to  form  an  object  of  the  size  and 
shape  of  a  cannonball,  whose  centroid  is  located  (approximately)  at  the 
origin,  say  in  the  form  of  a  real  gaussian  wave  packet  of  small  dimensions, 
with  variance  Oq  for  each  dimension. 

If  we  now  allow  a  long  lapse  of  time,  the  centroid  of  the  system  will 
spread  in  the  usual  manner  to  occupy  a  large  region  of  space.  (The  spread 
in  each  dimension  after  time  t  will  be  given  by  +  (ti  t  /Mo^m  ), 
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where  m  is  the  mass.)  Nevertheless,  for  any  specified  centroid  position, 
the  particles,  since  they  remain  in  bound  states,  have  distributions  which 
again  correspond  to  the  fairly  well  defined  size  and  shape  of  the  cannon¬ 
ball.  Thus  the  total  state  can  be  regarded  as  a  (continuous)  superposition 


of  states 


axyz  ^[cj(x,y,z)] 


dxdydz  , 


each  of  which 


^[cj(x,y,z)] 


(x,y,z).  The  coefficients 


)  describes  a  cannonball  at  the  position 
axyz  ^e  supetpositi011  then  correspond  to 


the  centroid  distribution. 


It  is  not  true  that  each  individual  particle  spreads  independently  of 
the  rest,  in  which  case  we  would  have  a  final  state  which  is  a  grand  super¬ 
position  of  states  in  which  the  particles  are  located  independently  every¬ 
where.  The  fact  that  they  are  in  bound  states  restricts  our  final  state  to 
a  superposition  of  “cannonball”  states.  The  wave  function  for  the  cen¬ 
troid  can  therefore  be  taken  as  a  representative  wave  function  for  the 


whole  object. 

It  is  thus  in  this  sense  of  correlations  between  constituent  particles 
that  definite  macroscopic  objects  can  exist  within  the  framework  of  pure 
wave  mechanics.  The  building  up  of  correlations  in  a  complex  system 
supplies  us  with  a  mechanism  which  also  allows  us  to  understand  how 
condensation  phenomena  (the  formation  of  spatial  boundaries  which  sepa¬ 
rate  phases  of  different  physical  or  chemical  properties)  can  be  controlled 
by  the  wave  equation,  answering  a  point  raised  by  Schrodinger 

Classical  mechanics,  also,  enters  our  scheme  in  the  form  of  correla¬ 
tion  laws.  Let  us  consider  a  system  of  objects  (in  the  previous  sense), 
such  that  the  centroid  of  each  object  has  initially  a  fairly  well  defined 
position  and  momentum  (e.g.,  let  the  wave  function  for  the  centroids  con¬ 
sist  of  a  product  of  gaussian  wave  packets).  As  time  progresses,  the 
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centers  of  the  square  amplitude  distributions  for  the  objects  will  move  in 
a  manner  approximately  obeying  the  laws  of  motion  of  classical  mechanics, 
with  the  degree  of  approximation  depending  upon  the  masses  and  the 
length  of  time  considered,  as  is  well  known.  (Note  that  we  do  not  mean 
to  imply  that  the  wave  packets  of  the  individual  objects  remain  indepen¬ 
dent  if  they  are  interacting.  They  do  not.  The  motion  that  we  refer  to  is 
that  of  the  centers  of  the  marginal  distributions  for  the  centroids  of  the 
bodies.) 

The  general  state  of  a  system  of  macroscopic  objects  does  not,  how¬ 
ever,  ascribe  any  nearly  definite  positions  and  momenta  to  the  individual 
bodies.  Nevertheless,  any  general  state  can  at  any  instant  be  analyzed 
into  a  superposition  of  states  each  of  which  does  represent  the  bodies 
with  fairly  well  defined  positions  and  momenta.  Each  of  these  states 
then  propagates  approximately  according  to  classical  laws,  so  that  the 
general  state  can  be  viewed  as  a  superposition  of  quasi-classical  states 
propagating  according  to  nearly  classical  trajectories.  In  other  words,  if 
the  masses  are  large  or  the  time  short,  there  will  be  strong  correlations 
between  the  initial  (approximate)  positions  and  momenta  and  those  at  a 
later  time,  with  the  dependence  being  given  approximately  by  classical 
mechanics. 

Since  large  scale  objects  obeying  classical  laws  have  a  place  in  our 
theory  of  pure  wave  mechanics,  we  have  justified  the  introduction  of 


For  any  £  one  can  construct  a  complete  orthonormal  set  of  (one  particle) 
states  <f}^  v,  where  the  double  index  p,v  refers  to  the  approximate  position  and 
momentum,  and  for  which  the  expected  position  and  momentum  values  run  indepen¬ 
dently  through  sets  of  approximately  uniform  density,  such  that  the  position  and 

momentum  uncertainties,  a  and  O.  satisfy  O  S  C8  and  a  S  C  — -  for  each 

x  p  x  P  2£ 

v,  where  C  is  a  constant  ~  60.  The  uncertainty  product  then  satisfies 

O  a  ^  ,  about  3,600  times  the  minimum  allowable,  but  still  sufficiently  low 

x  P  2 

for  macroscopic  objects.  This  set  can  then  be  used  as  a  basis  for  our  decomposi¬ 
tion  into  states  where  every  body  has  a  roughly  defined  position  and  momentum. 
For  a  more  complete  discussion  of  this  set  see  von  Neumann  [l7],  pp.  406-407, 
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models  for  observers  consisting  of  classically  describable,  automatically 
functioning  machinery,  and  the  treatment  of  observation  of  Chapter  IV  is 
non- vacuous. 

Let  us  now  consider  the  result  of  an  observation  (considered  along 
the  lines  of  Chapter  IV)  performed  upon  a  system  of  macroscopic  bodies 
in  a  general  state.  The  observer  will  not  become  aware  of  the  fact  that 
the  state  does  not  correspond  to  definite  positions  and  momenta  (i.e.,  he 
will  not  see  the  objects  as  “smeared  out”  over  large  regions  of  space) 
but  will  himself  simply  become  correlated  with  the  system  —  after  the  ob¬ 
servation  the  composite  system  of  objects  +  observer  will  be  in  a  super¬ 
position  of  states,  each  element  of  which  describes  an  observer  who  has 
perceived  that  the  objects  have  nearly  definite  positions  and  momenta, 
and  for  whom  the  relative  system  state  is  a  quasi-classical  state  in  the 
previous  sense,  and  furthermore  to  whom  the  system  will  appear  to  behave 
according  to  classical  mechanics  if  his  observation  is  continued.  We  see, 
therefore,  how  the  classical  appearance  of  the  macroscopic  world  to  us 
can  be  explained  in  the  wave  theory. 

§2.  Amplification  processes 

In  Chapter  III  and  IV  we  discussed  abstract  measuring  processes, 
which  were  considered  to  be  simply  a  direct  coupling  between  two  sys¬ 
tems,  the  object-system  and  the  apparatus  (or  observer).  There  is,  how¬ 
ever,  in  actuality  a  whole  chain  of  intervening  systems  linking  a  micro¬ 
scopic  system  to  a  macroscopic  observer.  Each  link  in  the  chain  of  inter¬ 
vening  systems  becomes  correlated  to  its  predecessor,  so  that  the  result 
is  an  amplification  of  effects  from  the  microscopic  object-system  to  a 
macroscopic  apparatus,  and  then  to  the  observer. 

The  amplification  process  depends  upon  the  ability  of  the  state  of  one 
micro-system  (particle,  for  example)  to  become  correlated  with  the  states 
of  an  enormous  number  of  other  microscopic  systems,  the  totality  of  which 
we  shall  call  a  detection  system.  For  example,  the  totality  of  gas  atoms 
in  a  Geiger  counter,  or  the  water  molecules  in  a  cloud  chamber,  constitute 
such  a  detection  system. 
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The  amplification  is  accomplished  by  arranging  the  condition  of  the 
detection  system  so  that  the  states  of  the  individual  micro-systems  of  the 
detector  are  metastable,  in  a  way  that  if  one  micro-system  should  fall  from 
its  metastable  state  it  would  influence  the  reduction  of  others.  This  type 
of  arrangement  leaves  the  entire  detection  system  metastable  against 
chain  reactions  which  involve  a  large  number  of  its  constituent  systems. 

In  a  Geiger  counter,  for  example,  the  presence  of  a  strong  electric  field 
leaves  the  gas  atoms  metastable  against  ionization.  Furthermore,  the 
products  of  the  ionization  of  one  gas  atom  in  a  Geiger  counter  can  cause 
further  ionizations,  in  a  cascading  process.  The  operation  of  cloud  cham¬ 
bers  and  photographic  films  is  also  due  to  metastability  against  such 
chain  reactions. 

The  chain  reactions  cause  large  numbers  of  the  micro-systems  of  the 
detector  to  behave  as  a  unit,  all  remaining  in  the  metastable  state,  or  all 
discharging.  In  this  manner  the  states  of  a  sufficiently  large  number  of 
micro-systems  are  correlated,  so  that  one  can  speak  of  the  whole  ensemble 
being  in  a  state  of  discharge,  or  not. 

For  example,  there  are  essentially  only  two  macroscopically  distin¬ 
guishable  states  for  a  Geiger  counter;  discharged  or  undischarged.  The 
correlation  of  large  numbers  of  gas  atoms,  due  to  the  chain  reaction  effect, 
implies  that  either  very  few,  or  else  very  many  of  the  gas  atoms  are  ionized 

p 

at  a  given  time.  Consider  the  complete  state  function  if/  of  a  Geiger 
counter,  which  is  a  function  of  all  the  coordinates  of  all  of  the  constituent 
particles.  Because  of  the  correlation  of  the  behavior  of  a  large  number  of 

p 

the  constituent  gas  atoms,  the  total  state  if/  can  always  be  written  as 
a  superposition  of  two  states 


(2.1)  tf/G  =  aj  +  a2  ^[d]  ’ 

where  signifies  a  state  where  only  a  small-  number  of  gas  atoms 

are  ionized,  and  a  state  for  which  a  large  number  are  ionized. 
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To  see  that  the  decomposition  (2.1)  is  valid,  expand  i [t  in  terms  of 

individual  gas  atom  stationary  states: 

r  i v  S,  S,  S 

(2.2)  ^G=  ^  aij...k^  Vj2...^kn  . 

S  iL 

where  r  is  the  £ln  state  of  atom  r.  Each  element  of  the  superposi¬ 
tion  (2.2) 

Si  S„  Sn 

(2.3) 

must  contain  either  a  very  large  number  of  atoms  in  ionized  states,  or  else 
a  very  small  number,  because  of  the  chain  reaction  effect.  By  choosing 
some  medium-sized  number  as  a  dividing  line,  each  element  of  (2.2)  can 
be  placed  in  one  of  the  two  categories,  high  number  of  low  number  of 
ionized  atoms.  If  we  then  carry  out  the  sum  (2.2)  over  only  those  elements 
of  the  first  category,  we  get  a  state  (and  coefficient) 


(2.4) 


ai^[D]=  2  ‘ 

ij...k 


The  state  i/^j  is  then  a  state  where  a  large  number  of  particles  are 
ionized.  The  subscript  [D]  indicates  that  it  describes  a  Geiger  counter 
which  has  discharged.  If  we  carry  out  the  sum  over  the  remaining  terms 
of  (2.2)  we  get  in  a  similar  fashion: 


(2-5)  a2^[ul=  2  aij...k*i1*jS2-^kn 

ij-.-k 

where  [U]  indicates  the  undischarged  condition.  Combining  (2.4)  and 
(2.5)  we  arrive  at  the  desired  relation  (2.1).  So  far,  this  method  of  decom¬ 
position  can  be  applied  to  any  system,  whether  or  not  it  has  the  chain  re¬ 
action  property.  However,  in  our  case,  more  is  implied,  namely  that  the 
spread  of  the  number  of  ionized  atoms  in  both  an d  will  be 

small  compared  to  the  separation  of  their  averages,  due  to  the  fact  that 
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the  existence  of  the  chain  reactions  means  that  either  many  or  else  few 
atoms  will  be  ionized,  with  the  middle  ground  virtually  excluded. 

This  type  of  decomposition  is  also  applicable  to  all  other  detection 
devices  which  are  based  upon  this  chain  reaction  principle  (such  as  cloud 
chambers,  photo  plates,  etc.). 

We  consider  now  the  coupling  of  such  a  detection  device  to  another 
micro-system  (object-system)  for  the  purpose  of  measurement.  If  it  is  true 
that  the  initial  object-system  state  if>1  will  at  some  time  t  trigger  the 
chain  reaction,  so  that  the  state  of  the  counter  becomes  0^,  while  the 
object-system  state  <f>2  will  not,  then  it  is  still  true  that  the  initial 
object-system  state  aj0j  +  a202  will  result  in  the  superposition 


(2.6)  +  a202  ^[u] 

at  time  t. 

For  example,  let  us  suppose  that  a  particle  whose  state  is  a  wave 
packet  0,  of  linear  extension  greater  than  that  of  our  Geiger  counter, 
approaches  the  counter.  Just  before  it  reaches  the  counter,  it  can  be  de¬ 
composed  into  a  superposition  0  =  aj0j  +  a202  (01(02  orthogonal) 
where  0X  has  non-zero  amplitude  only  in  the  region  before  the  counter 
and  02  has  non-zero  amplitude  elsewhere  (so  that  0t  is  a  packet  which 
will  entirely  pass  through  the  counter  while  02  will  entirely  miss  the 
counter).  The  initial  total  state  for  the  system  particle  +  counter  is  then: 

=  G»i^i  +  a2<^2)^[u]  ’ 


where  0^  is  the  initial  (assumed  to  be  undischarged)  state  of  the 
counter. 

But  at  a  slightly  later  time  0j  is  changed  to  <f>\,  after  traversing 
the  counter  and  causing  it  to  go  into  a  discharged  state  0^-j,  while  02 
passes  by  into  a  state  02  leaving  the  counter  in  an  undischarged  state 


<A 


[ur 


Superposing  these  results,  the  total  state  at  the  later  time  is 
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(2.7)  aj<^  <Ajdj  +  a2^2^[U] 

in  accordance  with  (2.6).  Furthermore,  the  relative  particle  state  for 
^[D]’  is  a  wave  packet  emanating  from  the  counter,  while  the  rela¬ 
tive  state  for  is  a  wave  with  a  “shadow”  cast  by  the  counter.  The 

counter  therefore  serves  as  an  apparatus  which  performs  an  approximate 
position  measurement  on  the  particle. 

No  matter  what  the  complexity  or  exact  mechanism  of  a  measuring 
process,  the  general  superposition  principle  as  stated  in  Chapter  III,  §3, 
remains  valid,  and  our  abstract  discussion  is  unaffected.  It  is  a  vain  hope 
that  somewhere  embedded  in  the  intricacy  of  the  amplification  process  is 
a  mechanism  which  will  somehow  prevent  the  macroscopic  apparatus  state 
from  reflecting  the  same  indefiniteness  as  its  object-system. 

§3.  Reversibility  and  irreversibility 

Let  us  return,  for  the  moment,  to  the  probabilistic  interpretation  of 
quantum  mechanics  based  on  Process  1  as  well  as  Process  2.  Suppose 
that  we  have  a  large  number  of  identical  systems  (ensemble),  and  that  the 
j**1  system  is  in  the  state  tf/K  Then  for  purposes  of  calculating  expecta¬ 
tion  values  for  operators  over  the  ensemble,  the  ensemble  is  represented 
by  the  mixture  of  states  ^  weighted  with  1/N,  where  N  is  the  number 
of  systems,  for  which  the  density  operator  is: 

(3.D  P  =  |S  [^J]  ’ 

j 

where  denotes  the  projection  operator  on  tf/\  This  density  operator, 

in  turn,  is  equivalent  to  a  density  operator  which  is  a  sum  of  projections 
on  orthogonal  states  (the  eigenstates  of  p):4 


3 


4 


Cf.  Chapter  III,  §1. 

See  Chapter  III,  §2,  particularly  footnote  6,  p.  46. 
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(3.2) 


P  =  2  Pi  [7?i]  ’  =  Sij’  2Pi=1  ’ 


so  that  any  ensemble  is  always  equivalent  to  a  mixture  of  orthogonal 
states,  which  representation  we  shall  henceforth  assume. 

Suppose  that  a  quantity  A,  with  (non-degenerate)  eigenstates  l0j! 
is  measured  in  each  system  of  the  ensemble.  This  measurement  has  the 
effect  of  transforming  each  state  into  the  state  <f> j,  with  probability 
|(0j,7/j)|2;  i.e.,  it  will  transform  a  large  ensemble  of  systems  in  the  state 
77.  into  an  ensemble  represented  by  the  mixture  whose  density  operator  is 
|(0j,  7?j)|2  [0jL  Extending  this  result  to  the  case  where  the  original 


j 

ensemble  is  a  mixture  of  the  weighted  by  Pj  ((3.2)),  we  find  that  the 
density  operator  p  is  transformed  by  the  measurement  of  A  into  the  new 
density  operator  p': 


.3)  p'  =  ^((r/i,0j)|2[<^j]  =  ^ 

i  j  j  '  i  ' 

-  2(*i>  £  pM*j)i*P = 2  • 


This  is  the  general  law  by  which  mixtures  change  through  Process  1. 

However,  even  when  no  measurements  are  taking  place,  the  states  of 
an  ensemble  are  changing  according  to  Process  2,  so  that  after  a  time 
interval  t  each  state  < jj  will  be  transformed  into  a  state  Uti/r, 
where  is  a  unitary  operator.  This  natural  motion  has  the  consequence 

that  each  mixture  P  =  2  ^i^i^  *s  carfie^  into  the  mixture  p'  =  ^,  P^U^l 
i  i 

after  a  time  t.  But  for  every  state  f , 


e'f  -  2  PiWtVf  -  2 

i  i 

=  Ut2pi(7?i'ut_1^i =  ut2pi[7?i](ut-1£> 

i  i 

=  (utp  ut— x)f  . 


(3.4) 
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Therefore 

(3.5)  p'=UtpUt-x, 

which  is  the  general  law  for  the  change  of  a  mixture  according  to  Process  2. 

We  are  now  interested  in  whether  or  not  we  get  from  any  mixture  to 
another  by  means  of  these  two  processes,  i.e.,  if  for  any  pair  p,p',  there 
exist  quantities  A  which  can  be  measured  and  unitary  (time  dependence) 
operators  U  such  that  p  can  be  transformed  into  p'  by  suitable  appli¬ 
cations  of  Processes  1  and  2.  We  shall  see  that  this  is  not  always  possi¬ 
ble,  and  that  Process  1  can  cause  irreversible  changes  in  mixtures. 

For  each  mixture  p  we  define  a  quantity  lp: 

(3.6)  Ip  =  Trace  (p  In  p)  . 

This  number.  Ip,  has  the  character  of  information.  If  p  =  ^Pj[77j],  a 

i 

mixture  of  orthogonal  states  j/j  weighted  with  Pj,  then  lp  is  simply 
the  information  of  the  distribution  Pj  over  the  eigenstates  of  p  (relative 
to  the  uniform  measure).  (Trace  (p  In  p)  is  a  unitary  invariant  and  is 
proportional  to  the  negative  of  the  entropy  of  the  mixture,  as  discussed  in 
Chapter  III,  §2.) 

Process  2  therefore  has  the  property  that  it  leaves  Ip  unchanged, 
because 

(3.7)  Ip'  =  Trace  (p'ln  p')  =  Trace  (UtpUfX  In  UtpU^x) 

=  Trace  (Utp  In  pU^"X)  =  Trace  (p  In  p)  =  lp  . 

Process  1,  on  the  other  hand,  can  decrease  Ip  but  never  increase  it. 
According  to  (3.3): 

P'  =  (<£j,p0j)[<£jl  =  ^  pi  l(*?i>0j)|2  [0j]  =  ^  Pj[0jl  > 

j  i»j  j 


(3.8) 
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where  pj  ^PiTij  and  Tij  =  K^i* <Aj)|2  is  a  doubly-stochastic 
i 

matrix.5  But  Ip'=^£pjlnPj  and  lp  =  ^  pi  In  pj,  with  the  pj,pj 
j  i 

connected  by  T”,  implies,  by  the  theorem  of  information  decrease  for 
stochastic  processes  (II-§6),  that: 

(3.9)  Ip'  ^  Ip  • 

Moreover,  it  can  easily  be  shown  by  a  slight  strengthening  of  the  theorems 
of  Chapter  II,  §6  that  strict  inequality  must  hold  unless  (for  each  i  such 
that  Pj  >  0)  Tjj  =  1  for  one  j  and  0  for  the  rest  (Tjj=^ikj).  This 
means  that  |(7?j,<£j)|2  =  Sikj,  which  implies  that  the  original  mixture  was 
already  a  mixture  of  eigenstates  of  the  measurement. 

We  have  answered  our  question,  and  it  is  not  possible  to  get  from  any 
mixture  to  another  by  means  of  processes  1  and  2.  There  is  an  essential 
irreversibility  to  process  1,  since  it  corresponds  to  a  stochastic  process, 
which  cannot  be  compensated  by  process  2,  which  is  reversible,  like 
classical  mechanics.6 

Our  theory  of  pure  wave  mechanics,  to  which  we  now  return,  must  give 
equivalent  results  on  the  subjective  level,  since  it  leads  to  process  1 
there.  Therefore,  measuring  processes  will  appear  to  be  irreversible  to 
any  observers  (even  though  the  composite  system  including  the  observer 
changes  its  state  reversibly). 


Since  1  Tj.  =  2  1(7?.,  <f>.)\2  =  2  (<£.,  [» =  (<£.,  2  [r?.]^)  =  (ff>y  I0j)  =  1. 


and  similarly  S  T. .  =  1  because  T..  is  symmetric, 
j 


For  another,  more  complete,  discussion  of  this  topic  in  the  probabilistic  in- 
terpretation  see  von  Neumann  [l7].  Chapter  V,  §4. 
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There  is  another  way  of  looking  at  this  apparent  irreversibility  within 
our  theory  which  recognizes  only  Process  2.  When  an  observer  performs 
an  observation  the  result  is  a  superposition,  each  element  of  which  de¬ 
scribes  an  observer  who  has  perceived  a  particular  value.  From  this  time 
forward  there  is  no  interaction  between  the  separate  elements  of  the  super¬ 
position  (which  describe  the  observer  as  having  perceived  different  results), 
since  each  element  separately  continues  to  obey  the  wave  equation.  Each 
observer  described  by  a  particular  element  of  the  superposition  behaves 
in  the  future  completely  independently  of  any  events  in  the  remaining  ele¬ 
ments,  and  he  can  no  longer  obtain  any  information  whatsoever  concerning 
these  other  elements  (they  are  completely  unobservable  to  him). 

The  irreversibility  of  the  measuring  process  is  therefore,  within  our 
framework,  simply  a  subjective  manifestation  reflecting  the  fact  that  in 
observation  processes  the  state  of  the  observer  is  transformed  into  a 
superposition  of  observer  states,  each  element  of  which  describes  an  ob¬ 
server  who  is  irrevocably  cut  off  from  the  remaining  elements.  While  it  is 
conceivable  that  some  outside  agency  could  reverse  the  total  wave  func¬ 
tion,  such  a  change  cannot  be  brought  about  by  any  observer  which  is 
represented  by  a  single  element  of  a  superposition,  since  he  is  entirely 
powerless  to  have  any  influence  on  any  other  elements. 

There  are,  therefore,  fundamental  restrictions  to  the  knowledge  that 
an  observer  can  obtain  about  the  state  of  the  universe.  It  is  impossible 
for  any  observer  to  discover  the  total  state  function  of  any  physical  sys¬ 
tem,  since  the  process  of  observation  itself  leaves  no  independent  state 
for  the  system  or  the  observer,  but  only  a  composite  system  state  in  which 
the  object-system  states  are  inextricably  bound  up  with  the  observer  states. 
As  soon  as  the  observation  is  performed,  the  composite  state  is  split  into 
a  superposition  for  which  each  element  describes  a  different  object-system 
state  and  an  observer  with  (different)  knowledge  of  it.  Only  the  totality 
of  these  observer  states,  with  their  diverse  knowledge,  contains  complete 
information  about  the  original  object-system  state  —  but  there  is  no  possi¬ 
ble  communication  between  the  observers  described  by  these  separate 
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states.  Any  single  observer  can  therefore  possess  knowledge  only  of  the 
relative  state  function  (relative  to  his  state)  of  any  systems,  which  is  in 
any  case  all  that  is  of  any  importance  to  him. 

We  conclude  this  section  by  commenting  on  another  question  which 
might  be  raised  concerning  irreversible  processes:  Is  it  necessary  for 
the  existence  of  measuring  apparata,  which  can  be  correlated  to  other 
systems,  to  have  frictional  processes  which  involve  systems  of  a  large 
number  of  degrees  of  freedom?  Are  such  thermodynamically  irreversible 
processes  possible  in  the  framework  of  pure  wave  mechanics  with  a  re¬ 
versible  wave  equation,  and  if  so,  does  this  circumstance  pose  any  diffi¬ 
culties  for  our  treatment  of  measuring  processes? 

In  the  first  place,  it  is  certainly  not  necessary  for  dissipative  proces¬ 
ses  involving  additional  degrees  of  freedom  to  be  present  before  an  inter¬ 
action  which  correlates  an  apparatus  to  an  object-system  can  take  place. 
The  counter-example  is  supplied  by  the  simplified  measuring  process  of 
III- §3,  which  involves  only  a  system  of  one  coordinate  and  an  apparatus 
of  one  coordinate  and  no  further  degrees  of  freedom. 

To  the  question  whether  such  processes  are  possible  within  reversi¬ 
ble  wave  mechanics,  we  answer  yes,  in  the  same  sense  that  they  are 
present  in  classical  mechanics,  where  the  microscopic  equations  of  motion 
are  also  reversible.  This  type  of  irreversibility,  which  might  be  called 
macroscopic  irreversibility ,  arises  from  a  failure  to  separate  “macroscopi- 
cally  indistinguishable”  states  into  “true”  microscopic  states.  It  has  a 
fundamentally  different  character  from  the  irreversibility  of  Process  1, 
which  applies  to  micro-states  as  well  and  is  peculiar  to  quantum  mechan¬ 
ics.  Macroscopically  irreversible  phenomena  are  common  to  both  classical 
and  quantum  mechanics,  since  they  arise  from  our  incomplete  information 

g 

concerning  a  system,  not  from  any  intrinsic  behavior  of  the  system. 


7 


See  any  textbook  on  statistical  mechanics,  such  as  ter 


Haar  [ll].  Appendix  I. 


Cf.  the  discussion  of  Chapter  II,  §7.  See  also  von  Neumann  [l7],  Chapter  V,  §4. 
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Finally,  even  when  such  frictional  processes  are  involved,  they  pre¬ 
sent  no  new  difficulties  for  the  treatment  of  measuring  and  observation 
processes  given  here.  We  imposed  no  restrictions  on  the  complexity  or 
number  of  degrees  of  freedom  of  measuring  apparatus  or  observers,  and  if 
any  of  these  processes  are  present  (such  as  heat  reservoirs,  etc.)  then 
these  systems  are  to  be  simply  included  as  part  of  the  apparatus  or  ob¬ 
server. 

§4.  Approximate  measurement 

A  phenomenon  which  is  difficult  to  understand  within  the  framework 
of  the  probabilistic  interpretation  of  quantum  mechanics  is  the  result  of 
an  approximate  measurement.  In  the  abstract  formulation  of  the  usual 
theory  there  are  two  fundamental  processes;  the  discontinuous,  probabilis¬ 
tic  Process  1  corresponding  to  precise  measurement,  and  the  continuous, 
deterministic  Process  2  corresponding  to  absence  of  any  measurement. 
What  mixture  of  probability  and  causality  are  we  to  apply  to  the  case 
where  only  an  approximate  measurement  is  effected  (i.e.,  where  the  appa¬ 
ratus  or  observer  interacts  only  weakly  and  for  a  finite  time  with  the 
object-system)? 

In  the  case  of  approximate  measurement,  we  need  to  be  supplied  with 
rules  which  will  tell  us,  for  any  initial  object-system  state,  first,  with 
what  probability  can  we  expect  the  various  possible  apparatus  readings, 
and  second,  what  new  state  to  ascribe  to  the  system  after  the  value  has 
been  observed.  We  shall  see  that  it  is  generally  impossible  to  give  these 
rules  within  a  framework  which  considers  the  apparatus  or  observer  as 
performing  an  (abstract)  observation  subject  to  Process  1,  and  that  it  is 
necessary,  in  order  to  give  a  full  account  of  approximate  measurements, 
to  treat  the  entire  system,  including  apparatus  or  observer,  wave  mechan¬ 
ically. 

The  position  that  an  approximate  measurement  results  in  the  situation 
that  the  object-system  state  is  changed  into  an  eigenstate  of  the  exact 
measurement,  but  for  which  particular  one  the  observer  has  only  imprecise 
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information,  is  manifestly  false.  It  is  a  fact  that  we  can  make  successive 
approximate  position  measurements  of  particles  (in  cloud  chambers,  for 
example)  and  use  the  results  for  somewhat  reliable  predictions  of  future 
positions.  However,  if  either  of  these  measurements  left  the  particle  in 
an  “eigenstate”  of  position  (  8  function),  even  though  the  particular  one 
remained  unknown,  the  momentum  would  have  such  a  variance  that  no  such 
prediction  would  be  possible.  (The  possibility  of  such  predictions  lies  in 
the  correlations  between  position  and  momentum  at  one  time  with  position 
and  momentum  at  a  later  time  for  wave  packets^  —  correlations  which  are 
totally  destroyed  by  precise  measurements  of  either  quantity.) 

Instead  of  continuing  the  discussion  of  the  inadequacy  of  the  proba¬ 
bilistic  formulation,  let  us  first  investigate  what  actually  happens  in 
approximate  measurements,  from  the  viewpoint  of  pure  wave  mechanics. 

An  approximate  measurement  consists  of  an  interaction,  for  a  finite  time, 
which  only  imperfectly  correlates  the  apparatus  (or  observer)  with  the 
object-system.  We  can  deduce  the  desired  rules  in  any  particular  case  by 
the  following  method:  For  fixed  interaction  and  initial  apparatus  state 
and  for  any  initial  object-system  state  we  solve  the  wave  equation  for  the 
time  of  interaction  in  question.  The  result  will  be  a  superposition  of 
apparatus  (observer)  states  and  relative  object-system  states.  Then 
(according  to  the  method  of  Chapter  IV  for  assigning  a  measure  to  a  super¬ 
position)  we  assign  a  probability  to  each  observed  result  equal  to  the 
square-amplitude  of  the  coefficient  of  the  element  which  contains  the 
apparatus  (observer)  state  representing  the  registering  of  that  result. 
Finally,  the  object-system  is  assigned  the  new  state  which  is  its  relative 
state  in  that  element. 

For  example,  let  us  consider  the  measuring  process  described  in  Chap¬ 
ter  III- §3,  which  is  an  excellent  model  for  an  approximate  measurement. 
After  the  interaction,  the  total  state  was  found  to  be  (III -(3. 12)): 


9 


See  Bohm  [l],  p.  202. 
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(4.1) 


£r  (q)  8(r-r')dr'  . 


Then,  according  to  our  prescription,  we  assign  the  probability  density 
P(r')  to  the  observation  of  the  apparatus  coordinate  r' 


(4.2) 


P(0  = 


if  -  f 
M  J 


<f>*(f>(q)ri*T)  (t'~  qt)dq 


which  is  the  square  amplitude  of  the  coefficient  °f  the  element 

£r(q)S(r—  r')  of  the  superposition  (4.1)  in  which  the  apparatus  coordinate 
has  the  value  r  =  r'.  Then,  depending  upon  the  observed  apparatus  coordi 
nate  r',  we  assign  the  object-system  the  new  state 

(4.3)  r  (q)  =  Nr^6(q)r?(r'-qt) 


(where  <f>( q)  is  the  old  state,  and  7j(r)  is  the  initial  apparatus  state) 
which  is  the  relative  object-system  state  in  (4.1)  for  apparatus  coordinate 
This  example  supplies  the  counter-example  to  another  conceivable 
method  of  dealing  with  approximate  measurement  within  the  framework  of 
Process  1,  This  is  the  position  that  when  an  approximate  measurement 
of  a  quantity  Q  is  performed,  in  actuality  another  quantity  Q'  is  pre¬ 
cisely  measured,  where  the  eigenstates  of  Q'  correspond  to  fairly  well- 
defined  (i.e.,  sharply  peaked  distributions  for)  Q  values.10  However, 
any  such  scheme  based  on  Process  1  always  has  the  prescription  that 
after  the  measurement,  the  (unnormalized)  new  state  function  results  from 
the  old  by  a  projection  (on  an  eigenstate  or  eigenspace),  which  depends 

upon  the  observed  value.  If  this  is  true,  then  in  the  above  example  the 

* 

new  state  £r(q)  must  result  from  the  old,  <f>( q),  by  a  projection  E: 

(4.4)  fr(q)  =  NE0(q)  =  0(q)  r/(r'-  qt) 


10 


Cf.  von  Neumann  [l7].  Chapter  IV,  §4. 
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where  N,  Nf'  are  normalization  constants).  But  E  is  only  a  projection 
if  E2  =  E.  Applying  the  operation  (4.4)  twice,  we  get: 

(4.5)  E(NE0(q))  =  NE2<£(q)  =  N'0(q)r,2(r'-  qt)  =>  E20(q) 

=  g  <?i>(q)?72(r/-qt)  , 

and  we  see  that  E  cannot  be  a  projection  unless  7j(q)  =  rj  (q)  for  all 
q  (i.e.,  rj( q)  =  0  or  1  for  all  q)  and  we  have  arrived  at  a  contradiction 
to  the  assumption  that  in  all  cases  the  changes  of  states  for  approximate 
measurements  are  governed  by  projections.  (In  certain  special  cases, 
such  as  approximate  position  measurements  with  slits  or  Geiger  counters,11 
the  new  functions  arise  from  the  old  by  multiplication  by  sharp  cutoff 
functions  which  are  1  over  the  slit  or  counter  and  0  elsewhere,  so  that 
these  measurements  can  be  handled  by  projections.) 

One  cannot,  therefore,  account  for  approximate  measurements  by  any 
scheme  based  on  Process  1,  and  it  is  necessary  to  investigate  these  pro¬ 
cesses  entirely  wave-mechanically.  Our  viewpoint  constitutes  a  frame¬ 
work  in  which  it  is  possible  to  make  precise  deductions  about  such  mea¬ 
surements  and  observations,  since  we  can  follow  in  detail  the  interaction 
of  an  observer  or  apparatus  with  an  object-system. 

§5.  Discussion  of  a  spin  measurement  example 

We  shall  conclude  this  chapter  with  a  discussion  of  an  instructive 
example  of  Bohm.12  Bohm  considers  the  measurement  of  the  z  component 

■fe 

of  the  angular  momentum  of  an  atom,  whose  total  angular  momentum  is  j, 
which  is  brought  about  by  a  Stem-Gerlach  experiment.  The  measurement 


Cf.  §2,  this  chapter. 
Bohm  [l],  p.  593. 
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is  accomplished  by  passing  an  atomic  beam  through  an  inhomogeneous 
magnetic  field,  which  has  the  effect  of  giving  the  particle  a  momentum 
which  is  directed  up  or  down  depending  upon  whether  the  spin  was  up  or 
down. 

The  measurement  is  treated  as  impulsive,  so  that  during  the  time  that 
the  atom  passes  through  the  field  the  Hamiltonian  is  taken  to  be  simply 
the  interaction: 

(5.1)  H  I  =  p(S-K),  p  = 

where  K  is  the  magnetic  field  and  S  the  spin  operator  for  the  atom.  The 
particle  is  presumed  to  pass  through  a  region  of  the  field  where  the  field 
is  in  the  z  direction,  so  that  during  the  time  of  transit  the  field  is 
approximately  Hz  —  KQ  +  z  KQ  =  (Hz)  ^  and  HQ  =  ai*d 

hence  the  interaction  is  approximately. 

(5.2)  Hj  a  p(K0  +  zK'0)Sz  , 


where  Sz  denotes  the  operator  for  the  z  component  of  the  spin. 

It  is  assumed  that  the  state  of  the  atom,  just  prior  to  entry  into  the 
field,  is  a  wave  packet  of  the  form: 

(5.3)  =  f0(z)(c+v+  +  c_v_) 


where  v+  and  v_  are  the  spin  functions  for  Sz  =  1  and  —1  respec¬ 
tively.  Solving  the  Schrodinger  equation  for  the  Hamiltonian  (5.2)  and 
initial  condition  (5.3)  yields  the  state  for  a  later  time  t : 


(5.4)  tfr 


=  f0<2)(<V 


-ip(H0+zJQ  t/n 


V+  +  C_ 


t-i/z(H0+zHg)  t/h 


-)• 


13 

Therefore,  if  At  is  the  time  that  it  takes  the  atom  to  traverse  the  field, 
each  component  of  the  wave  packet  has  been  multiplied  by  a  phase  factor 


This  time  is,  strictly  speaking,  not  well  defined.  The  results,  however,  do 
not  depend  critically  upon  it. 
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±i/t(Kn+zKl)At/1i 

e  ,  i.e.,  has  had  its  mean  momentum  in  the  z  direction 

changed  by  an  amount  ±H0/r  At,  depending  upon  the  spin  direction.  Thus 
the  initial  wave  packet  (with  mean  momentum  zero)  is  split  into  a  super¬ 
position  of  two  packets,  one  with  mean  z-momentum  +  H0ft  At  and  spin 
up,  and  the  other  with  spin  down  and  mean  z-momentum  —  KQft  At. 

The  interaction  (5.2)  has  therefore  served  to  correlate  the  spin  with 
the  momentum  in  the  z-direction.  These  two  packets  of  the  resulting 
superposition  now  move  in  opposite  z-directions,  so  that  after  a  short 
time  they  become  widely  separated  (provided  that  the  momentum  changes 
±Kgfj.  At  are  large  compared  to  the  momentum  spread  of  the  original 
packet),  and  the  z-coordinate  is  itself  then  correlated  with  the  spin  — 
representing  the  “apparatus”  coordinate  in  this  case.  The  Stern-Gerlach 
apparatus  therefore  splits  an  incoming  wave  packet  into  a  superposition 
of  two  diverging  packets,  corresponding  to  the  two  spin  values. 

We  take  this  opportunity  to  caution  against  a  certain  viewpoint  which 
can  lead  to  difficulties.  This  is  the  idea  that,  after  an  apparatus  has 
interacted  with  a  system,  in  “actuality”  one  or  another  of  the  elements 
of  the  resultant  superposition  described  by  the  composite  state-function 
has  been  realized  to  the  exclusion  of  the  rest,  the  existing  one  simply 
being  unknown  to  an  external  observer  (i.e.,  that  instead  of  the  super¬ 
position  there  is  a  genuine  mixture).  This  position  must  be  erroneous 
since  there  is  always  the  possibility  for  the  external  observer  to  make 
use  of  interference  properties  between  the  elements  of  the  superposition. 

In  the  present  example,  for  instance,  it  is  in  principle  possible  to  de¬ 
flect  the  two  beams  back  toward  one  another  with  magnetic  fields  and  re¬ 
combine  them  in  another  inhomogeneous  field,  which  duplicates  the  first, 
in  such  a  manner  that  the  original  spin  state  (before  entering  the  appa¬ 
ratus)  is  restored.14  This  would  not  be  possible  if  the  original  Stern- 
Gerlach  apparatus  performed  the  function  of  converting  the  original  wave 


14  As  pointed  out  by  Bohm  [l],  p.  604. 
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packet  into  a  non-interfering  mixture  of  packets  for  the  two  spin  cases. 
Therefore  the  position  that  after  the  atom  has  passed  through  the  inhomo¬ 
geneous  field  it  is  “really”  in  one  or  the  other  beam  with  the  correspond¬ 
ing  spin,  although  we  are  ignorant  of  which  one,  is  incorrect. 

After  two  systems  have  interacted  and  become  correlated  it  is  true 
that  marginal  expectations  for  subsystem  operators  can  be  calculated 
correctly  when  the  composite  system  is  represented  by  a  certain  non¬ 
interfering  mixture  of  states.  Thus  if  the  composite  system  state  is 
^Si+S2  =  where  the  ii^l  are  orthogonal,  then  for  pur¬ 

poses  of  calculating  the  expectations  of  operators  on  Sj  the  state 

S  -j-S  So 

if/  1  2  is  equivalent  to  the  non-interfering  mixture  of  states  77^ 

weighted  by  Pj  =  a*a^,  and  one  can  take  the  picture  that  one  or  another 

of  the  cases  has  been  realized  to  the  exclusion  of  the  rest,  with 

probabilities  Pj.15 

However,  this  representation  by  a  mixture  must  be  regarded  as  only  a 
mathematical  artifice  which,  although  useful  in  many  cases,  is  an  incom¬ 
plete  description  because  it  ignores  phase  relations  between  the  separate 
elements  which  actually  exist,  and  which  become  important  in  any  inter¬ 
actions  which  involve  more  than  just  a  subsystem. 

In  the  present  example,  the  “composite  system”  is  made  of  the  “sub¬ 
systems”  spin  value  (object-system)  and  z-coordinate  (apparatus),  and 
the  superposition  of  the  two  diverging  wave  packets  is  the  state  after 
interaction.  It  is  only  correct  to  regard  this  state  as  a  mixture  so  long  as 
any  contemplated  future  interactions  or  measurements  will  involve  only 
the  spin  value  or  only  the  z-coordinate,  but  not  both  simultaneously.  As 
we  saw,  phase  relations  between  the  two  packets  are  present  and  become 
important  when  they  are  deflected  back  and  recombined  in  another  inhomo¬ 
geneous  field  —  a  process  involving  the  spin  values  and  z-coordinate 
simultaneously. 


15  See  Chapter  III,  §1. 
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It  is  therefore  improper  to  attribute  any  less  validity  or  “reality”  to 
any  element  of  a  superposition  than  any  other  element,  due  to  this  ever 
present  possibility  of  obtaining  interference  effects  between  the  elements. 
All  elements  of  a  superposition  must  be  regarded  as  simultaneously 
existing. 

At  this  time  we  should  like  to  add  a  few  remarks  concerning  the  notion 
of  transition  probabilities  in  quantum  mechanics.  Often  one  considers  a 
system,  with  Hamiltonian  H  and  stationary  states  (0jl,  to  be  perturbed 
for  a  time  by  a  time-dependent  addition  to  the  Hamiltonian,  Hj(t).  Then 
under  the  action  of  the  perturbed  Hamiltonian  H'  =  H  +  Hj(t)  the  states 
{0jl  are  generally  no  longer  stationary  but  change  after  time  t  into  new 
states  |0j(t)i : 

(5.5)  0>  -  ^j(t)  =  ^(0j,0j(t))0.  =  ^  ajj(t) 0j  , 

j  j 

which  can  be  represented  as  a  superposition  of  the  old  stationary  states 
with  time-dependent  coefficients  a-j(t). 

If  at  time  r  a  measurement  with  eigenstates  0j  is  performed,  such 
as  an  energy  measurement  (whose  operator  is  the  original  H  ),  then 
according  to  the  probabilistic  interpretation  the  probability  for  finding  the 
state  0 j,  given  that  the  state  was  originally  0j,  is  Pjj(0  =  lajj(r)t 2 - 
The  quantities  lajj(r)! 2  are  often  referred  to  as  transition  probabilities. 

In  this  case,  however,  the  name  is  a  misnomer,  since  it  carries  the  conno¬ 
tation  that  the  original  state  0-  is  transformed  into  a  mixture  (of  the  0j 
weighted  by  PjjOO),  and  gives  the  erroneous  impression  that  the  quantum 
formalism  itself  implies  the  existence  of  quantum-jumps  (stochastic  pro¬ 
cesses)  independent  of  acts  of  observation.  This  is  incorrect  since  there 
is  still  a  pure  state  ^Tajj(r)0j  with  phase  relations  between  the  0j, 
j 

and  expectations  of  operators  other  than  the  energy  must  be  calculated 
from  the  superposition  and  not  the  mixture. 

There  is  another  case,  however,  the  one  usually  encountered  in  fact, 
where  the  transition  probability  concept  is  somewhat  more  justified.  This 


108 


HUGH  EVERETT,  III 


is  the  case  in  which  the  perturbation  is  due  to  interaction  of  the  system 
Sj  with  another  system  s2,  and  not  simply  a  time  dependence  of  Sj’s 
Hamiltonian  as  in  the  case  just  considered.  In  this  situation  the  interac¬ 
tion  produces  a  composite  system  state,  for  which  there  are  in  general  no 
independent  subsystem  states.  However,  as  we  have  seen,  for  purposes 
of  calculating  expectations  of  operators  on  s^  alone,  we  can  regard  Sj 
as  being  represented  by  a  certain  mixture.  According  to  this  picture  the 
states  of  subsystem  Sj  are  gradually  converted  into  mixtures  by  the 
interaction  with  s2  and  the  concept  of  transition  probability  makes  some 
sense.  Of  course,  it  must  be  remembered  that  this  picture  is  only  justi¬ 
fied  so  long  as  further  measurements  on  Sj  alone  are  contemplated,  and 
any  attempt  to  make  a  simultaneous  determination  in  Sj  and  s2  involves 
the  composite  state  where  interference  properties  may  be  important. 

An  example  is  a  hydrogen  atom  interacting  with  the  electromagnetic 
field.  After  a  time  of  interaction  we  can  picture  the  atom  as  being  in  a 
mixture  of  its  states,  so  long  as  we  consider  future  measurements  on  the 
atom  only.  But  in  actuality  the  state  of  the  atom  is  dependent  upon 
(correlated  with)  the  state  of  the  field,  and  some  process  involving  both 
atom  and  field  could  conceivably  depend  on  interference  effects  between 
the  states  of  the  alleged  mixture.  With  these  restrictions,  however,  the 
concept  of  transition  probability  is  quite  useful  and  justified. 


VI.  DISCUSSION 


We  have  shown  that  our  theory  based  on  pure  wave  mechanics,  which 
takes  as  the  basic  description  of  physical  systems  the  state  function  — 
supposed  to  be  an  objective  description  (i.e.,  in  one-one,  rather  than 
statistical,  correspondence  to  the  behavior  of  the  system)  —  can  be  put  in 
satisfactory  correspondence  with  experience.  We  saw  that  the  probabilis¬ 
tic  assertions  of  the  usual  interpretation  of  quantum  mechanics  can  be 
deduced  from  this  theory,  in  a  manner  analogous  to  the  methods  of  classi¬ 
cal  statistical  mechanics,  as  subjective  appearances  to  observers  — 
observers  which  were  regarded  simply  as  physical  systems  subject  to  the 
same  type  of  description  and  laws  as  any  other  systems,  and  having  no 
preferred  position.  The  theory  is  therefore  capable  of  supplying  us  with 
a  complete  conceptual  model  of  the  universe,  consistent  with  the  assump¬ 
tion  that  it  contains  more  than  one  observer. 

Because  the  theory  gives  us  an  objective  description,  it  constitutes  a 
framework  in  which  a  number  of  puzzling  subjects  (such  as  classical  level 
phenomena,  the  measuring  process  itself,  the  inter-relationship  of  several 
observers,  questions  of  reversibility  and  irreversibility,  etc.)  can  be  in¬ 
vestigated  in  detail  in  a  logically  consistent  manner.  It  supplies  a  new 
way  of  viewing  processes,  which  clarifies  many  apparent  paradoxes  of  the 
usual  interpretation1  —  indeed,  it  constitutes  an  objective  framework  in 
which  it  is  possible  to  understand  the  general  consistency  of  the  ordinary 
view. 


1  Such  as  that  of  Einstein,  Rosen,  and  Podolsky  [8],  as  well  as  the  paradox  of 
the  introduction. 
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We  shall  now  resume  our  discussion  of  alternative  interpretations. 

There  has  been  expressed  lately  a  great  deal  of  dissatisfaction  with  the 
present  form  of  quantum  theory  by  a  number  of  authors,  and  a  wide  variety 
of  new  interpretations  have  sprung  into  existence.  We  shall  now  attempt 
to  classify  briefly  a  number  of  these  interpretations,  and  comment  upon 
them. 

a.  The  “ popular ”  interpretation.  This  is  the  scheme  alluded  to  in 
the  introduction,  where  \fi  is  regarded  as  objectively  characteriz¬ 
ing  the  single  system,  obeying  a  deterministic  wave  equation  when 
the  system  is  isolated  but  changing  probabilistically  and  discon- 
tinuously  under  observation. 

In  its  unrestricted  form  this  view  can  lead  to  paradoxes  like  that  men¬ 
tioned  in  the  introduction,  and  is  therefore  untenable.  However,  this  view 
is  consistent  so  long  as  it  is  assumed  that  there  is  only  one  observer  in 
the  universe  (the  solipsist  position  -  Alternative  1  of  the  Introduction). 
This  consistency  is  most  easily  understood  from  the  viewpoint  of  our  own 
theory,  where  we  were  able  to  show  that  all  phenomena  will  seem  to  follow 
the  predictions  of  this  scheme  to  any  observer.  Our  theory  therefore  justi¬ 
fies  the  personal  adoption  of  this  probabilistic  interpretation,  for  purposes 
of  making  practical  predictions,  from  a  more  satisfactory  framework. 

b.  The  Copenhagen  interpretation.  This  is  the  interpretation  developed 
by  Bohr.  The  function  is  not  regarded  as  an  objective  descrip¬ 
tion  of  a  physical  system  (i.e.,  it  is  in  no  sense  a  conceptual 
model),  but  is  regarded  as  merely  a  mathematical  artifice  which 
enables  one  to  make  statistical  predictions,  albeit  the  best  predic¬ 
tions  which  it  is  possible  to  make.  This  interpretation  in  fact 
denies  the  very  possibility  of  a  single  conceptual  model  applicable 
to  the  quantum  realm,  and  asserts  that  the  totality  of  phenomena 
can  only  be  understood  by  the  use  of  different,  mutually  exclusive 
(i.e.,  “complementary”)  models  in  different  situations.  All  state- 
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ments  about  microscopic  phenomena  are  regarded  as  meaningless 
unless  accompanied  by  a  complete  description  (classical)  of  an 
experimental  arrangement. 

While  undoubtedly  safe  from  contradiction,  due  to  its  extreme  conserva¬ 
tism,  it  is  perhaps  overcautious.  We  do  not  believe  that  the  primary  pur¬ 
pose  of  theoretical  physics  is  to  construct  “safe”  theories  at  severe  cost 
in  the  applicability  of  their  concepts,  which  is  a  sterile  occupation,  but 
to  make  useful  models  which  serve  for  a  time  and  are  replaced  as  they  are 
outworn.2 

Another  objectionable  feature  of  this  position  is  its  strong  reliance 
upon  the  classical  level  from  the  outset,  which  precludes  any  possibility 
of  explaining  this  level  on  the  basis  of  an  underlying  quantum  theory.  (The 
deduction  of  classical  phenomena  from  quantum  theory  is  impossible  simply 
because  no  meaningful  statements  can  be  made  without  pre-existing  classi¬ 
cal  apparatus  to  serve  as  a  reference  frame.)  This  interpretation  suffers 
from  the  dualism  of  adhering  to  a  “reality”  concept  (i.e.,  the  possibility 
of  objective  description)  on  the  classical  level  but  renouncing  the  same 
in  the  quantum  domain. 

c.  The  “ hidden  variables”  interpretation.  This  is  the  position 
(Alternative  4  of  the  Introduction)  that  ^  is  not  a  complete  de¬ 
scription  of  a  single  system.  It  is  assumed  that  the  correct  com¬ 
plete  description,  which  would  involve  further  (hidden)  parameters, 
would  lead  to  a  deterministic  theory,  from  which  the  probabilistic 
aspects  arise  as  a  result  of  our  ignorance  of  these  extra  parameters 
in  the  same  manner  as  in  classical  statistical  mechanics. 


2 


Cf.  Appendix  II. 
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The  ^-function  is  therefore  regarded  as  a  description  of  an  ensemble 
of  systems  rather  than  a  single  system.  Proponents  of  this  interpretation 

o  A  C 

include  Einstein,  Bohm,  Wiener  and  Siegal. 

Einstein  hopes  that  a  theory  along  the  lines  of  his  general  relativity, 
where  all  of  physics  is  reduced  to  the  geometry  of  space-time  could  satis¬ 
factorily  explain  quantum  effects.  In  such  a  theory  a  particle  is  no  longer 
a  simple  object  but  possesses  an  enormous  amount  of  structure  (i.e.,  it  is 
thought  of  as  a  region  of  space-time  of  high  curvature).  It  is  conceivable 
that  the  interactions  of  such  “particles”  would  depend  in  a  sensitive  way 
upon  the  details  of  this  structure,  which  would  then  play  the  role  of  the 
“hidden  variables.”6  However,  these  theories  are  non-linear  and  it  is 
enormously  difficult  to  obtain  any  conclusive  results.  Nevertheless,  the 
possibility  cannot  be  discounted. 

Bohm  considers  iff  to  be  a  real  force  field  acting  on  a  particle  which 
always  has  a  well-defined  position  and  momentum  (which  are  the  hidden 
variables  of  this  theory).  The  ^r-field  satisfying  Schrodinger’s  equation 
is  pictured  as  somewhat  analogous  to  the  electromagnetic  field  satisfying 
Maxwell’s  equations,  although  for  systems  of  n  particles  the  ^-field  is 
in  a  3n-dimensional  space.  With  this  theory  Bohm  succeeds  in  showing 
that  in  all  actual  cases  of  measurement  the  best  predictions  that  can  be 
made  are  those  of  the  usual  theory,  so  that  no  experiments  could  ever  rule 
out  his  interpretation  in  favor  of  the  ordinary  theory.  Our  main  criticism 
of  this  view  is  on  the  grounds  of  simplicity  —  if  one  desires  to  hold  the 
view  that  ^  is  a  real  field  then  the  associated  particle  is  superfluous 
since,  as  we  have  endeavored  to  illustrate,  the  pure  wave  theory  is  itself 
satisfactory. 

Einstein  [7]. 

Bohm  [2]. 

Wiener  and  Siegal  [20], 

For  an  example  of  this  type  of  theory  see  Einstein  and  Rosen  [9]. 
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Wiener  and  Siegal  have  developed  a  theory  which  is  more  closely  tied 
to  the  formalism  of  quantum  mechanics.  From  the  set  N  of  all  non¬ 
degenerate  linear  Hermitian  operators  for  a  system  having  a  complete  set 
of  eigenstates,  a  subset  I  is  chosen  such  that  no  two  members  of  I  com¬ 
mute  and  every  element  outside  I  commutes  with  at  least  one  element  of 
I.  The  set  I  therefore  contains  precisely  one  operator  for  every  orienta¬ 
tion  of  the  principal  axes  of  the  Hilbert  space  for  the  system.  It  is  postu¬ 
lated  that  each  of  the  operators  of  I  corresponds  to  an  independent  ob¬ 
servable  which  can  take  any  of  the  real  numerical  values  of  the  spectrum 

n 

of  the  operator.  This  theory,  in  its  present  form,  is  a  theory  of  infinitely 
many  “hidden  variables,”  since  a  system  is  pictured  as  possessing  (at 
each  instant)  a  value  for  every  one  of  these  “observables”  simultaneously, 
with  the  changes  in  these  values  obeying  precise  (deterministic)  dynamical 
laws.  However,  the  change  of  any  one  of  these  variables  with  time  depends 
upon  the  entire  set  of  observables,  so  that  it  is  impossible  ever  to  discover 
by  measurement  the  complete  set  of  values  for  a  system  (since  only  one 
“observable”  at  a  time  can  be  observed).  Therefore,  statistical  ensembles 
are  introduced,  in  which  the  values  of  all  of  the  observables  are  related  to 
points  in  a  “differential  space,”  which  is  a  Hilbert  space  containing  a 
measure  for  which  each  (differential  space)  coordinate  has  an  independent 
normal  distribution.  It  is  then  shown  that  the  resulting  statistical  dynamics 
is  in  accord  with  the  usual  form  of  quantum  theory. 

It  cannot  be  disputed  that  these  theories  are  often  appealing,  and  might 
conceivably  become  important  should  future  discoveries  indicate  serious 
inadequacies  in  the  present  scheme  (i.e.,  they  might  be  more  easily  modi¬ 
fied  to  encompass  new  experience).  But  from  our  viewpoint  they  are 
usually  more  cumbersome  than  the  conceptually  simpler  theory  based  on 
pure  wave  mechanics.  Nevertheless,  these  theories  are  of  great  theoretical 
importance  because  they  provide  us  with  examples  that  “hidden  variables” 
theories  are  indeed  possible. 


7 


A  non-denumerable  infinity,  in  fact,  since  the  set  I  is  uncountable! 
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d.  The  stochastic  process  interpretation.  This  is  the  point  of  view 
which  holds  that  the  fundamental  processes  of  nature  are  stochas¬ 
tic  (i.e.,  probabilistic)  processes.  According  to  this  picture 
physical  systems  are  supposed  to  exist  at  all  times  in  definite 
states,  but  the  states  are  continually  undergoing  probabilistic 
changes.  The  discontinuous  probabilistic  “quantum-jumps"  are 
not  associated  with  acts  of  observation,  but  are  fundamental  to  the 
systems  themselves. 

A  stochastic  theory  which  emphasizes  the  particle,  rather  than  wave, 

Q 

aspects  of  quantum  theory  has  been  investigated  by  Bopp.  The  particles 
do  not  obey  deterministic  laws  of  motion,  but  rather  probabilistic  laws, 
and  by  developing  a  general  “correlation  statistics”  Bopp  shows  that  his 
quantum  scheme  is  a  special  case  which  gives  results  in  accord  with  the 
usual  theory.  (This  accord  is  only  approximate  and  in  principle  one  could 
decide  between  the  theories.  The  approximation  is  so  close,  however, 
that  it  is  hardly  conceivable  that  a  decision  would  be  practically  feasible.) 

Bopp’s  theory  seems  to  stem  from  a  desire  to  have  a  theory  founded 
upon  particles  rather  than  waves,  since  it  is  this  particle  aspect  (highly 
localized  phenomena)  which  is  most  frequently  encountered  in  present  day 
high-energy  experiments  (cloud  chamber  tracks,  etc.).  However,  it  seems 
to  us  to  be  much  easier  to  understand  particle  aspects  from  a  wave  picture 
(concentrated  wave  packets)  than  it  is  to  understand  wave  aspects  (diffrac¬ 
tion,  interference,  etc.)  from  a  particle  picture. 

Nevertheless,  there  can  be  no  fundamental  objection  to  the  idea  of  a 
stochastic  theory,  except  on  grounds  of  a  naked  prejudice  for  determinism. 
The  question  of  determinism  or  indeterminism  in  nature  is  obviously  for¬ 
ever  undecidable  in  physics,  since  for  any  current  deterministic  [proba¬ 
bilistic]  theory  one  could  always  postulate  that  a  refinement  of  the  theory 
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would  disclose  a  probabilistic  [deterministic]  substructure,  and  that  the 
current  deterministic  [probabilistic]  theory  is  to  be  explained  in  terms  of 
the  refined  theory  on  the  basis  of  the  law  of  large  numbers  [ignorance  of 
hidden  variables].  However,  it  is  quite  another  matter  to  object  to  a  mix¬ 
ture  of  the  two  where  the  probabilistic  processes  occur  only  with  acts  of 
observation. 

e.  The  wave  interpretation.  This  is  the  position  proposed  in  the 
present  thesis,  in  which  the  wave  function  itself  is  held  to  be  the 
fundamental  entity,  obeying  at  all  times  a  deterministic  wave 
equation. 

This  view  also  corresponds  most  closely  with  that  held  by  Schrodinger. 
However,  this  picture  only  makes  sense  when  observation  processes  them¬ 
selves  are  treated  within  the  theory.  It  is  only  in  this  manner  that  the 
apparent  existence  of  definite  macroscopic  objects,  as  well  as  localized 
phenomena,  such  as  tracks  in  cloud  chambers,  can  be  satisfactorily  ex¬ 
plained  in  a  wave  theory  where  the  waves  are  continually  diffusing.  With 
the  deduction  in  this  theory  that  phenomena  will  appear  to  observers  to  be 
subject  to  Process  1,  Heisenberg’s  criticism10  of  Schrftdinger’s  opinion  — 
that  continuous  wave  mechanics  could  not  seem  to  explain  the  discontinui¬ 
ties  which  are  everywhere  observed  —  is  effectively  met.  The  “quantum- 
jumps”  exist  in  our  theory  as  relative  phenomena  (i.e.,  the  states  of  an 
object-system  relative  to  chosen  observer  states  show  this  effect),  while 
the  absolute  states  change  quite  continuously. 

The  wave  theory  is  definitely  tenable  and  forms,  we  believe,  the 
simplest  complete,  self-consistent  theory. 


Schrodinger  [l8]. 
Heisenberg  [14]. 
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We  should  like  now  to  comment  on  some  views  expressed  by  Einstein. 
Einstein’s11  criticism  of  quantum  theory  (which  is  actually  directed  more 
against  what  we  have  called  the  “popular”  view  than  Bohr’s  interpreta¬ 
tion)  is  mainly  concerned  with  the  drastic  changes  of  state  brought  about 
by  simple  acts  of  observation  (i.e.,  the  infinitely  rapid  collapse  of  wave 
functions),  particularly  in  connection  with  correlated  systems  which  are 
widely  separated  so  as  to  be  mechanically  uncoupled  at  the  time  of  obser¬ 
vation.12  At  another  time  he  put  his  feeling  colorfully  by  stating  that  he 

could  not  believe  that  a  mouse  could  bring  about  drastic  changes  in  the 

13 

universe  simply  by  looking  at  it. 

However,  from  the  standpoint  of  our  theory,  it  is  not  so  much  the  sys¬ 
tem  which  is  affected  by  an  observation  as  the  observer,  who  becomes 
correlated  to  the  system. 

In  the  case  of  observation  of  one  system  of  a  pair  of  spatially  sepa¬ 
rated,  correlated  systems,  nothing  happens  to  the  remote  system  to  make 
any  of  its  states  more  “real”  than  the  rest.  It  had  no  independent  states 
to  begin  with,  but  a  number  of  states  occurring  in  a  superposition  with 
corresponding  states  for  the  other  (near)  system.  Observation  of  the  near 
system  simply  correlates  the  observer  to  this  system,  a  purely  local  pro¬ 
cess  —  but  a  process  which  also  entails  automatic  correlation  with  the 
remote  system.  Each  state  of  the  remote  system  still  exists  with  the  same 
amplitude  in  a  superposition,  but  now  a  superposition  for  which  element 
contains,  in  addition  to  a  remote  system  state  and  correlated  near  system 
state,  an  observer  state  which  describes  an  observer  who  perceives  the 
state  of  the  near  system.14  From  the  present  viewpoint  all  elements  of 


11  Einstein  [7]. 

12  For  example,  the  paradox  of  Einstein,  Rosen,  and  Podolsky  [8]. 

13  Address  delivered  at  Palmer  Physical  Laboratory,  Princeton,  Spring,  1954. 

14  See  in  this  connection  Chapter  IV,  particularly  pp.  82,  83. 
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this  superposition  are  equally  "real.”  Only  the  observer  state  has 
changed,  so  as  to  become  correlated  with  the  state  of  the  near  system  and 
hence  naturally  with  that  of  the  remote  system  also.  The  mouse  does  not 
affect  the  universe  —  only  the  mouse  is  affected. 

Our  theory  in  a  certain  sense  bridges  the  positions  of  Einstein  and 
Bohr,  since  the  complete  theory  is  quite  objective  and  deterministic  (“God 
does  not  play  dice  with  the  universe”),  and  yet  on  the  subjective  level, 
of  assertions  relative  to  observer  states,  it  is  probabilistic  in  the  strong 
sense  that  there  is  no  way  for  observers  to  make  any  predictions  better 
than  the  limitations  imposed  by  the  uncertainty  principle.15 

In  conclusion,  we  have  seen  that  if  we  wish  to  adhere  to  objective 
descriptions  then  the  principle  of  the  psycho-physical  parallelism  requires 
that  we  should  be  able  to  consider  some  mechanical  devices  as  represent¬ 
ing  observers.  The  situation  is  then  that  such  devices  must  either  cause 
the  probabilistic  discontinuities  of  Process  1,  or  must  be  transformed  into 
the  superpositions  we  have  discussed.  We  are  forced  to  abandon  the  for¬ 
mer  possibility  since  it  leads  to  the  situation  that  some  physical  systems 
would  obey  different  laws  from  the  rest,  with  no  clear  means  for  distin¬ 
guishing  between  these  two  types  of  systems.  We  are  thus  led  to  our 
present  theory  which  results  from  the  complete  abandonment  of  Process  1 
as  a  basic  process.  Nevertheless,  within  the  context  of  this  theory, 
which  is  objectively  deterministic,  it  develops  that  the  probabilistic 
aspects  of  Process  1  reappear  at  the  subjective  level,  as  relative  phenom¬ 
ena  to  observers. 

One  is  thus  free  to  build  a  conceptual  model  of  the  universe,  which 
postulates  only  the  existence  of  a  universal  wave  function  which  obeys  a 
linear  wave  equation.  One  then  investigates  the  internal  correlations  in 
this  wave  function  with  the  aim  of  deducing  laws  of  physics,  which  are 


15 


Cf.  Chapter  V,  §2. 
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statements  that  take  the  form:  Under  the  conditions  C  the  property  A 
of  a  subsystem  of  the  universe  (subset  of  the  total  collection  of  coordi¬ 
nates  for  the  wave  function)  is  correlated  with  the  property  B  of  another 
subsystem  (with  the  manner  of  correlation  being  specified).  For  example, 
the  classical  mechanics  of  a  system  of  massive  particles  becomes  a  law 
which  expresses  the  correlation  between  the  positions  and  momenta 
(approximate)  of  the  particles  at  one  time  with  those  at  another  time.16 
All  statements  about  subsystems  then  become  relative  statements,  i.e., 
statements  about  the  subsystem  relative  to  a  prescribed  state  for  the  re¬ 
mainder  (since  this  is  generally  the  only  way  a  subsystem  even  possesses 
a  unique  state),  and  all  laws  are  correlation  laws. 

The  theory  based  on  pure  wave  mechanics  is  a  conceptually  simple 
causal  theory,  which  fully  maintains  the  principle  of  the  psycho-physical 
parallelism.  It  therefore  forms  a  framework  in  which  it  is  possible  to  dis¬ 
cuss  (in  addition  to  ordinary  phenomena)  observation  processes  them¬ 
selves,  including  the  inter-relationships  of  several  observers,  in  a  logical, 
unambiguous  fashion.  In  addition,  all  of  the  correlation  paradoxes,  like 
that  of  Einstein,  Rosen,  and  Podolsky,  find  easy  explanation. 

While  our  theory  justifies  the  personal  use  of  the  probabilistic  inter¬ 
pretation  as  an  aid  to  making  practical  predictions,  it  forms  a  broader 
frame  in  which  to  understand  the  consistency  of  that  interpretation.  It 
transcends  the  probabilistic  theory,  however,  in  its  ability  to  deal  logi¬ 
cally  with  questions  of  imperfect  observation  and  approximate  measurement. 

Since  this  viewpoint  will  be  applicable  to  all  forms  of  quantum  mechan¬ 
ics  which  maintain  the  superposition  principle,  it  may  prove  a  fruitful 
framework  for  the  interpretation  of  new  quantum  formalisms.  Field  theories, 
particularly  any  which  might  be  relativistic  in  the  sense  of  general  rela- 


16  Cf.  Chapter  V,  §2. 

1^  Einstein,  Rosen,  and  Podolsky  [8]. 
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tivity,  might  benefit  from  this  position,  since  one  is  free  to  construct 
formal  (non-probabilistic)  theories,  and  supply  any  possible  statistical 
interpretations  later.  (This  viewpoint  avoids  the  necessity  of  considering 
anomalous  probabilistic  jumps  scattered  about  space-time,  and  one  can 
assert  that  field  equations  are  satisfied  everywhere  and  everywhen,  then 
deduce  any  statistical  assertions  by  the  present  method.) 

By  focusing  attention  upon  questions  of  correlations,  one  may  be  able 
to  deduce  useful  relations  (correlation  laws  analogous  to  those  of  classi¬ 
cal  mechanics)  for  theories  which  at  present  do  not  possess  known  classi¬ 
cal  counterparts.  Quantized  fields  do  not  generally  possess  pointwise 
independent  field  values,  the  values  at  one  point  of  space-time  being 
correlated  with  those  at  neighboring  points  of  space-time  in  a  manner,  it 
is  to  be  expected,  approximating  the  behavior  of  their  classical  counter¬ 
parts.  If  correlations  are  important  in  systems  with  only  a  finite  number 
of  degrees  of  freedom,  how  much  more  important  they  must  be  for  systems 
of  infinitely  many  coordinates. 

Finally,  aside  from  any  possible  practical  advantages  of  the  theory, 
it  remains  a  matter  of  intellectual  interest  that  the  statistical  assertions 
of  the  usual  interpretation  do  not  have  the  status  of  independent  hypoth¬ 
eses,  but  are  deducible  (in  the  present  sense)  from  the  pure  wave  mechan¬ 
ics,  which  results  from  their  omission. 


APPENDIX  I 


We  shall  now  supply  the  proofs  of  a  number  of  assertions  which  have 
been  made  in  the  text. 


§1.  Proof  of  Theorem  1 

We  now  show  that  iX,Y,.. 
random  variables.  Abbreviate 


(1.1) 


*ij . .  .k  = 


.,Z\  >  0  unless  X,Y,...,Z  are  independent 


P(xi,yj,... 

,zk)  by  Pjj  k,  and  let 

Pij...k 

if  PiPj...PU>0 

PiPj-Pk 

1 

if  PiPj...Pk  =  0 

(Note  that  PjPj...Pk  =  0  implies  that  also  P-  k  =  0.)  Then  always 


(1.2) 


Pij. 


,.k 


=  Q 


ij...kPiPj‘"Pk 


and  we  have 
(1.3)  {X,Y, 


,Z|  =  Exp 


Pij. 


PiPi' 


Exp  [  In  Qij...k] 


“  S  Pipj...Pk  Qij.  .k  Qij...k  • 

ij...k 

Applying  the  inequality  for  x  ^  0 : 

(1.4)  x  In  x  >  x  —  1  (except  for  x  =  1) 


(which  is  easily  established  by  calculating  the  minimum  of  x  In  x  — (x— 1)) 
to  (1.3)  we  have: 
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(1.5)  PiPj...Pk  Qij...k In  Qij...k  >  PipJ~pk%..k-1> 

(unless  Q-j  k  =  1)  . 

Therefore  we  have  for  the  sum: 


a.®  2  Pipi  -pk«ij...kl»«ii...k>  2  pipi  -  pk<3ij...k-  2pipj  -p 


ij-.-k 



unless  all  Qij...k  =  1-  But  ^  PiPj  ”Pk  Qij...k  =  2Pij...k=1’  and 

i]...k  ij...k 

y  PjPj...Pj<  =  1,  so  that  the  right  side  of  (1.6)  vanishes.  The  left 
ij...k 

side  is,  by  (1.3)  the  correlation  iX,Y,...,Z|,  and  the  condition  that  all  of 
the  Qjj  k  equal  one  is  precisely  the  independence  condition  that 
Pjj  k  =  P-Pj.-.P^  for  all  i,j,...,k.  We  have  therefore  proved  that 

(1.7)  {X,Y,...,Z|  >  0 


unless  X,Y,...,Z  are  mutually  independent. 

§2.  Convex  function  inequalities 

We  shall  now  establish  some  basic  inequalities  which  follow  from  the 
convexity  of  the  function  x  In  x. 


Lemma  1.  xj  Z  0,  Pj  ^  0,  Pj  =  1 

i 

=>  (2  pixi) ln  (2  pixi)  =  2  pixi ln  xi  • 

This  property  is  usually  taken  as  the  definition  of  a  convex  function,1 
but  follows  from  the  fact  that  the  second  derivative  of  x  In  x  is  positive 
for  all  positive  x,  which  is  the  elementary  notion  of  convexity.  There  is 
also  an  immediate  corollary  for  the  continuous  case: 


1 


See  Hardy,  Littlewood,  and  Polya  [13],  p.  70. 
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Corollary  1. 


g(x)  >  0,  P(x) 


so’  / 

J*  P(x)  g(x)  dxj  In  P(x)g(x)dx|  ^  J* 


P(x)  dx  =  1 
P(x)g(x)  In  g(x)  dx 


We  can  now  derive  a  more  general  and  very  useful  inequality  from 
Lemma  1: 


Lemma  2. 


X|  ^  0,  a^  0  (all  i) 


Proof:  Let  Pj  =  ai/^ai»  so  that  Pj  S  0  and  ^  P;  =  1.  Then  by 
Lemma  1:  i  1 


(2.1) 


L  i 


J  L  i  -I  i 


Substitution  for  Pj  yields: 


and  we  have  proved  the  lemma. 


124 


HUGH  EVERETT,  in 


We  also  mention  the  analogous  result  for  the  continuous  case: 
Corollary  2.  f(x)  ^  0,  g(x)  Z  0  (all  x) 

-  [/'H in  |jgS] 5  /f(-H«)d* 


§3.  Refinement  theorems 

We  now  supply  the  proof  for  Theorems  2  and  4  of  Chapter  II,  which 
concern  the  behavior  of  correlation  and  information  upon  refinement  of  the 
distributions.  We  suppose  that  the  original  (unrefined)  distribution  is 
Pjj  =  P(xj,yj,...IZjt)I  and  that  the  re/inec?  distribution  is 
where  the  original  value  x^  for  X  has  been  resolved  into  a  number  of 
values  and  similarly  for  Y,...,Z.  Then: 


(3-1)  2  C2 . ,k.  p1-2p(‘i- 

fV'j . %  h 

Computing  the  new  correlation  for  the  refined  distribution 

p we  find; 

1J‘“  /  p^i'l'j . *?k 

(3.2)  IX.V . ».  2  2  . - T 

v, . %  \p  ‘pi  ’ . n 

However,  by  Lemma  2,  §2: 


2  p;Vk 

m-Vk 



i—^k 


<  V  *i.~k _ 

=  2  Pi-k  IpAp^j  p^kj  ' 

\Pi  'PJ  . Pk 


^i-'Jk 
...k 


(3.3) 
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Substitution  of  (3.3)  into  (3.2),  noting  that  2  p^i  py^j  .  ^p^k 


W--J?k 


equal  to 



(3.4) 

IX.Y, 


-.zi'  if  2  2  py!i:k’k\  >• 

\ii-k  / 


2  p^;k,,?k 




p 

-  2  pij...k  1»  p^3r  -  IX.Y . 2] . 

ij...k  1  1  * 


and  we  have  completed  the  proof  of  Theorem  2  (Chapter  II),  which  asserts 

9 

that  refinement  never  decreases  the  correlation. 

We  now  consider  the  effect  of  refinement  upon  the  relative  information. 
We  shall  use  the  previous  notation,  and  further  assume  that  a-^.bj1^,..., 
c^k  are  the  information  measures  for  which  we  wish  to  compute  the  rela¬ 
tive  information  of  anc*  of  Pjj  k-  The  information  mea¬ 

sures  for  the  unrefined  distribution  P-  k  then  satisfy  the  relations: 


(3.5) 


H  "i 

The  relative  information  of  the  refined  distribution  is 


(3.6) 


VXY...Z 


2  2  <:rk 

I-j  /rr..7/k 


Pij...k 


a'^b^i  c^k 

i  ’Dj  ’  ,ck 


and  by  exactly  the  same  procedure  as  we  have  just  used  for  the  correla¬ 
tion  we  arrive  at  the  result: 


2 


Cf.  Shannon  [l9],  Appendix  7,  where  a  quite  similar  theorem  is  proved. 
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(3.7) 


!XY 


..Z  = 


pii  k 

Pii  1c  ln  --U  —  =  I- 


XY...Z  ' 


and  we  have  proved  that  refinement  never  decreases  the  relative  informa¬ 
tion  (Theorem  4,  Chapter  II). 

It  is  interesting  to  note  that  the  relation  (3.4)  for  the  behavior  of 
correlation  under  refinement  can  be  deduced  from  the  behavior  of  relative 
information,  (3.7).  This  deduction  is  an  immediate  consequence  of  the 
fact  that  the  correlation  is  a  relative  information  —  the  information  of  the 
joint  distribution  relative  to  the  product  measure  of  the  marginal  distribu¬ 
tions. 


§4.  Monotone  decrease  of  information  for  stochastic  processes 

We  consider  a  sequence  of  transition-probability  matrices  T-  (^PTy 

j 

1  for  all  n,  i,  and  0  $  T?.  ^  1  for  all  n,  i,  j),  and  a  sequence  of 
measures  a?  (a?  ^  0)  having  the  property  that 


(4.1) 


i 


We  further  suppose  that  we  have  a  sequence  of  probability  distributions, 
P?,  such  that 


(4.2) 


1 


For  each  of  these  probability  distributions  the  relative  information 
In  (relative  to  the  a-1  measure)  is  defined: 


(4.3) 


Under  these  circumstances  we  have  the  following  theorem: 


Theorem. 


In+1  <  ln 
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Proof:  Expanding  In+1  we  get: 

/p?+i\  (SpiTij) 

(4.4,  ,»« .  2  pr‘ '*  (^j  -  2  (2  P?T3)  - 

However,  by  Lemma  2  (§2,  Appendix  I)  we  have  the  inequality 


(4.5) 


(SPiTij)  P?T?. 

/  V  P?T?\  In  2-i _ -  <  V  P?T?.  In  1  . 

\  i  1  7  1  1J  a"T" 


(2  •fr?j)  1 

Substitution  of  (4.5)  into  (4.4)  yields: 


i  »J 


(4.6) 


In+1  < 


2  fe  p"T?j  f )  ■  2 p”  (2TSi)  >»  (^j 


/P”' 

-2pfta(dl-1"- 


and  the  proof  is  completed. 

This  proof  can  be  successively  specialized  to  the  case  where  T  is 
stationary  (T =  Tjj  for  all  n )  and  then  to  the  case  where  T  is 
doubly-stochastic  (  =  1  for  all  j): 

i 

COROLLARY  1.  T-j  is  stationary  (T-j  =  Tjj,  all  n),  and  the  measure 
aj  is  a  stationary  measure  (aj  =  ai^ij)>  1Tnpiy  that  the  information, 

In  =  ^  P?  In  (P^/a-1),  is  monotone  decreasing.  (As  before,  P?+1  = 


i 


Proof:  Immediate  consequence  of  preceding  theorem. 
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COROLLARY  2.  T J j  is  doubly-stochastic  (  T  j  j  =  1  j)  implies 

i 

that  the  information  relative  to  the  uniform  measure  (a:  =  l,  all  i),  In  = 
P-1  In  P?,  is  monotone  decreasing. 
i 

Proof:  For  aj  =  1  (all  i)  we  have  that  ^aiTjj=  ^T-j  =  1  =  aj. 

i  i 

Therefore  the  uniform  measure  is  stationary  in  this  case  and  the  result 
follows  from  Corollary  1. 

These  results  hold  for  the  continuous  case  also,  and  may  be  easily 
verified  by  replacing  the  above  summations  by  integrations,  and  by  re¬ 
placing  Lemma  2  by  its  corollary. 


§5.  Proof  of  special  inequality  for  Chapter  IV  (1.7) 


LEMMA.  Given  probability  densities  P(r),  Pj(x),  P2(r), 
J'pi(x)P2(t—xr)dx.  Then  Ijj  $  Ijj  —  lnr,  where  I X  ~  ^ 
and  Ij^  =  J*  P(r)  In  P(r)  dr. 


with  P(r)  = 

Pj(x)  In  Pj(x)dx 


J*  P2(r— xr)dx  =  J  P2(co)  &  =  \  (all  r) 


Proof :  We  first  note  that: 

(5.1) 

and  that  furthermore 

(5.2)  I  P2(r—xr)dr  —  “ 

We  now  define  the  density  P  (x): 


(all  x) 


(5.3)  Pr(x)  =  rP2(r-xr)  , 

which  is  normalized,  by  (5.1).  Then,  according  to  §2,  Corollary  1  Appen¬ 
dix  I),  we  have  the  relation: 
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(5.4)  ^  J’  P^P^dxj  In  Pt(x)P1(x)dx^  %  J Pr(x)P1(x)dx  . 


Substitution  from  (5.3)  gives 

(5.5) 


The  relation  P(r) 


J'  P2(r-xr)P1(x)dxj  In  jr  J P2(r-xr)Pj(x)dx^ 
^  r  J*P2(t-xr)P1(x)  In  Px(x)dx  . 

=  /Pl 


(x)P9(r— xr)dx,  together  with  (5.5)  then  implies 


(5.6)  P(r)  In  rP(r)  <  J  P2(r-xr)P1(x)  In  P^dx  , 
which  is  the  same  as: 

(5.7)  P(r)  In  P(r)  <  J  P2(r-xr)P1(x)  In  Pj(x)dx  -  P(r)  Inr  . 

Integrating  with  respect  to  r,  and  interchanging  the  order  of  integration 
on  the  right  side  gives: 


=  J* P(r)  In  P(r)dr  <  J  J] 


Px(x)  In  Pj(x)dx 


(5.8)  IR=J  P(r)  In  P(r)dr  <  |  |  |  P,(r-xr)dr| 

-(Inr) J  P(r) dr 

But  using  (5.2)  and  the  fact  that  |  P(r)dr=l  this  means  that 

(5.9) 

and  the  proof  of  the  lemma  is  completed. 


I 


IR  S  J*  Pj(x)  In  Pj(x)  dx  -  Inr  =  Ix  -  Inr  , 


§6.  Stationary  point  of  Ij^  +  Ix 

We  shall  show  that  the  information  sum: 


(6.D  IK  +  IX  =  f  <£*<£(k)  In  <£*0(k)dk  +  J*  ijr*tfr{x)  In  ifr*tfr{x)dx  , 


where 


<£00  =  J  e  lkx^(x)dx 
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is  stationary  for  the  functions: 

1_  1_ 

(6.2)  0o(x)  =  (l/2n-cr^)4  ,  0Q(k)  =  (2a2Ar)4  ef^x  , 

with  respect  to  variations  of  iff,  8ifr,  which  preserve  the  normalization: 

(6.3)  J  8(^V)dx  =  0  . 

— OO 

The  variation  8if/  gives  rise  to  a  variation  8cf>  of  <^>(k) : 


(6.4) 


/ow 

e-ikx8^dx  . 


To  avoid  duplication  of  effort  we  first  calculate  the  variation  8  I^r  for  an 
arbitrary  wave  function  u(f ).  By  definition. 


(6.5) 

so  that 

(6.6) 


l£  =  J*  u*(£  )  u(f  )  In  u*(f  )  u(f  )  df  , 

— OO 

OO 

8lg=J*  [u*u  8(ln  u*u)  +  8(u*u)  In  u*u]  df 

— OO 

-/ 


(1  +  In  u*u)(u*8u  u8u*)df  . 

5 

We  now  suppose  that  u  has  the  real  form: 

(6.7)  u(f )  =  a  e-b^  =  u*(f )  , 
and  from  (6.6)  we  get 

OO 

(6.8)  8  =  I  (1  +  In  a2  —  2bf  2)ae” ^  (8u)d£  +  complex  conjugate. 

J-oo 

We  now  compute  S  Ij^  for  <f>Q  using  (6.8),  (6.2),  and  (6.4): 

OO  -*oo 

(6.9)  SIK|^  =J  (1  +  In  a2  -  2b'k2)ae-b'k2  -jL  J  e_ikxS^  dxdk  +  c.c. 
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where 


a  =  (2 a2/*)4  ,  b'  =  . 


Interchanging  the  order  of  integration  and  performing  the  definite  integra¬ 
tion  over  k  we  get: 


oo 

(6-10)  s‘Kk’J  ^(1»»'2  +  s’)e"<’,2/4b')^Wd:i 


while  application  of  (6.8)  to  pQ  gives 


+  c.c.  , 


(6.11)  SIX 



a"2  —  2b"x2)a'ye  8p(x)dx  +  c.c.  , 


where 


a"  =  (1/2 rrol)  ,  b'  =  (l/Aop  . 


Adding  (6.10)  and  (6.11),  and  substituting  for  a',  b'  a",  b",  yields: 

c«  —  .  2  9\ 

(6.12)  £(Ik  +  Ix)|^  =  (1—  lnir) (1  /2tra%)  e  x  S^(x)dx+c.c.  . 

But  the  integrand  of  (6.12)  is  simply  Pq(x)8i/j(x),  so  that 


(6.13) 


S%  +  IX^|^  =(1~ln>7)  j”  Pq 


8pdx  +  c.c.  . 


Since  pQ  is  real,  pQ8p  +  c.c.  =  p^Si//  +  c-c-  =  Pq^P  +  Pq^P*  =  $(P  P X 
so  that 

(6.14)  =  (l— In  nr) J  8(p*p)  dx  =  0  , 


due  to  the  normality  restriction  (6.3),  and  the  proof  is  completed. 
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REMARKS  ON  THE  ROLE  OF  THEORETICAL  PHYSICS 

There  have  been  lately  a  number  of  new  interpretations  of  quantum 
mechanics,  most  of  which  are  equivalent  in  the  sense  that  they  predict  the 
same  results  for  all  physical  experiments.  Since  there  is  therefore  no  hope 
of  deciding  among  them  on  the  basis  of  physical  experiments,  we  must  turn 
elsewhere,  and  inquire  into  the  fundamental  question  of  the  nature  and  pur¬ 
pose  of  physical  theories  in  general.  Only  after  we  have  investigated  and 
come  to  some  sort  of  agreement  upon  these  general  questions,  i.e.,  of  the 
role  of  theories  themselves,  will  we  be  able  to  put  these  alternative  inter¬ 
pretations  in  their  proper  perspective. 

Every  theory  can  be  divided  into  two  separate  parts,  the  formal  part, 
and  the  interpretive  part.  The  formal  part  consists  of  a  purely  logico- 
mathematical  structure,  i.e.,  a  collection  of  symbols  together  with  rules 
for  their  manipulation,  while  the  interpretive  part  consists  of  a  set  of 
“associations,”  which  are  rules  which  put  some  of  the  elements  of  the 
formal  part  into  correspondence  with  the  perceived  world.  The  essential 
point  of  a  theory,  then,  is  that  it  is  a  mathematical  model,  together  with 
an  isomorphism 1  between  the  model  and  the  world  of  experience  (i.e.,  the 
sense  perceptions  of  the  individual,  or  the  “real  world”  —  depending  upon 
one’s  choice  of  epistemology). 


By  isomorphism  we  mean  a  mapping  of  some  elements  of  the  model  into  ele¬ 
ments  of  the  perceived  world  which  has  the  property  that  the  model  is  faithful, 
that  is,  if  in  the  model  a  symbol  A  implies  a  symbol  B,  and  A  corresponds 
to  the  happening  of  an  event  in  the  perceived  world,  then  the  event  corresponding 
to  B  must  also  obtain.  The  word  homomorphism  would  be  technically  more 
correct,  since  there  may  not  be  a  one-one  correspondence  between  the  model  and 
the  external  world. 
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The  model  nature  is  quite  apparent  in  the  newest  theories,  as  in  nuclear 
physics,  and  particularly  in  those  fields  outside  of  physics  proper,  such 
as  the  Theory  of  Games,  various  economic  models,  etc.,  where  the  degree 
of  applicability  of  the  models  is  still  a  matter  of  considerable  doubt.  How¬ 
ever,  when  a  theory  is  highly  successful  and  becomes  firmly  established, 
the  model  tends  to  become  identified  with  “reality”  itself,  and  the  model 
nature  of  the  theory  becomes  obscured.  The  rise  of  classical  physics 
offers  an  excellent  example  of  this  process.  The  constructs  of  classical 
physics  are  just  as  much  fictions  of  our  own  minds  as  those  of  any  other 
theory  we  simply  have  a  great  deal  more  confidence  in  them.  It  must  be 
deemed  a  mistake,  therefore,  to  attribute  any  more  “reality”  here  than 
elsewhere. 

Once  we  have  granted  that  any  physical  theory  is  essentially  only  a 
model  for  the  world  of  experience,  we  must  renounce  all  hope  of  finding 
anything  like  “the  correct  theory.”  There  is  nothing  which  prevents  any 
number  of  quite  distinct  models  from  being  in  correspondence  with  experi¬ 
ence  (i.e.,  all  “correct”),  and  furthermore  no  way  of  ever  verifying  that 
any  model  is  completely  correct,  simply  because  the  totality  of  all  experi¬ 
ence  is  never  accessible  to  us. 

Two  types  of  prediction  can  be  distinguished;  the  prediction  of  pheno¬ 
mena  already  understood,  in  which  the  theory  plays  simply  the  role  of  a 
device  for  compactly  summarizing  known  results  (the  aspect  of  most 
interest  to  the  engineer),  and  the  prediction  of  new  phenomena  and  effects, 
unsuspected  before  the  formulation  of  the  theory.  Our  experience  has 
shown  that  a  theory  often  transcends  the  restricted  field  in  which  it  was 
formulated.  It  is  this  phenomenon  (which  might  be  called  the  “inertia” 
of  theories)  which  is  of  most  interest  to  the  theoretical  physicist,  and 
supplies  a  greater  motive  to  theory  construction  than  that  of  aiding  the 
engineer. 

From  the  viewpoint  of  the  first  type  of  prediction  we  would  say  that 
the  “best”  theory  is  the  one  from  which  the  most  accurate  predictions 
can  be  most  easily  deduced  —  two  not  necessarily  compatible  ideals. 
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Classical  physics,  for  example,  permits  deductions  with  far  greater  ease 
than  the  more  accurate  theories  of  relativity  and  quantum  mechanics,  and 
in  such  a  case  we  must  retain  them  all.  It  would  be  the  worst  sort  of 
folly  to  advocate  that  the  study  of  classical  physics  be  completely  dropped 
in  favor  of  the  newer  theories.  It  can  even  happen  that  several  quite  dis¬ 
tinct  models  can  exist  which  are  completely  equivalent  in  their  predictions, 
such  that  different  ones  are  most  applicable  in  different  cases,  a  situation 
which  seems  to  be  realized  in  quantum  mechanics  today.  It  would  seem 
foolish  to  attempt  to  reject  all  but  one  in  such  a  situation,  where  it  might 
be  profitable  to  retain  them  all. 

Nevertheless,  we  have  a  strong  desire  to  construct  a  single  all- 
embracing  theory  which  would  be  applicable  to  the  entire  universe.  From 
what  stems  this  desire?  The  answer  lies  in  the  second  type  of  prediction 
—  the  discovery  of  new  phenomena  —  and  involves  the  consideration  of 
inductive  inference  and  the  factors  which  influence  our  confidence  in  a 
given  theory  (to  be  applicable  outside  of  the  field  of  its  formulation).  This 
is  a  difficult  subject,  and  one  which  is  only  beginning  to  be  studied  seri¬ 
ously.  Certain  main  points  are  clear,  however,  for  example,  that  our  con¬ 
fidence  increases  with  the  number  of  successes  of  a  theory.  If  a  new 
theory  replaces  several  older  theories  which  deal  with  separate  phenomena, 
i.e.,  a  comprehensive  theory  of  the  previously  diverse  fields,  then  our 
confidence  in  the  new  theory  is  very  much  greater  than  the  confidence  in 
either  of  the  older  theories,  since  the  range  of  success  of  the  new  theory 
is  much  greater  than  any  of  the  older  ones.  It  is  therefore  this  factor  of 
confidence  which  seems  to  be  at  the  root  of  the  desire  for  comprehensive 
theories. 

A  closely  related  criterion  is  simplicity  —  by  which  we  refer  to  con¬ 
ceptual  simplicity  rather  than  ease  in  use,  which  is  of  paramount  interest 
to  the  engineer.  A  good  example  of  the  distinction  is  the  theory  of  general 
relativity  which  is  conceptually  quite  simple,  while  enormously  cumber¬ 
some  in  actual  calculations.  Conceptual  simplicity,  like  comprehensive¬ 
ness,  has  the  property  of  increasing  confidence  in  a  theory.  A  theory 
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containing  many  ad  hoc  constants  and  restrictions,  or  many  independent 
hypotheses,  in  no  way  impresses  us  as  much  as  one  which  is  largely  free 
of  arbitrariness. 

It  is  necessary  to  say  a  few  words  about  a  view  which  is  sometimes 
expressed,  the  idea  that  a  physical  theory  should  contain  no  elements 
which  do  not  correspond  directly  to  observables.  This  position  seems  to 
be  founded  on  the  notion  that  the  only  purpose  of  a  theory  is  to  serve  as 
a  summary  of  known  data,  and  overlooks  the  second  major  purpose,  the 
discovery  of  totally  new  phenomena.  The  major  motivation  of  this  view¬ 
point  appears  to  be  the  desire  to  construct  perfectly  “safe”  theories 
which  will  never  be  open  to  contradiction.  Strict  adherence  to  such  a 
philosophy  would  probably  seriously  stifle  the  progress  of  physics. 

The  critical  examination  of  just  what  quantities  are  observable  in  a 
theory  does,  however,  play  a  useful  role,  since  it  gives  an  insight  into 
ways  of  modification  of  a  theory  when  it  becomes  necessary.  A  good  ex¬ 
ample  of  this  process  is  the  development  of  Special  Relativity.  Such 
successes  of  the  positivist  viewpoint,  when  used  merely  as  a  tool  for  de¬ 
ciding  which  modifications  of  a  theory  are  possible,  in  no  way  justify  its 
universal  adoption  as  a  general  principle  which  all  theories  must  satisfy. 

In  summary,  a  physical  theory  is  a  logical  construct  (model),  consist¬ 
ing  of  symbols  and  rules  for  their  manipulation,  some  of  whose  elements 
are  associated  with  elements  of  the  perceived  world.  The  fundamental 
requirements  of  a  theory  are  logical  consistency  and  correctness.  There 
is  no  reason  why  there  cannot  be  any  number  of  different  theories  satisfy¬ 
ing  these  requirements,  and  further  criteria  such  as  usefulness,  simplicity, 
comprehensiveness,  pictorability,  etc.,  must  be  resorted  to  in  such  cases 
to  further  restrict  the  number.  Even  so,  it  may  be  impossible  to  give  a 
total  ordering  of  the  theories  according  to  “goodness,”  since  different 
ones  may  rate  highest  according  to  the  different  criteria,  and  it  may  be 
most  advantageous  to  retain  more  than  one. 

As  a  final  note,  we  might  comment  upon  the  concept  of  causality.  It 
should  be  clearly  recognized  that  causality  is  a  property  of  a  model,  and 
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not  a  property  of  the  world  of  experience.  The  concept  of  causality  only 
makes  sense  with  reference  to  a  theory,  in  which  there  are  logical  depend¬ 
ences  among  the  elements.  A  theory  contains  relations  of  the  form  “A 
implies  B,”  which  can  be  read  as  “A  causes  B,”  while  our  experi¬ 
ence,  uninterpreted  by  any  theory,  gives  nothing  of  the  sort,  but  only  a 
correlation  between  the  event  corresponding  to  B  and  that  corresponding 
to  A. 
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